
A multi-agent system for intelligent cross-domain data queries built with LangGraph, Databricks Genie, Lakebase, and Claude models/skills on Databricks Platform.
Organizations struggle to query data across multiple domains and data sources, requiring deep SQL expertise and knowledge of complex data schemas. Databricks Unified Chat solves this by providing an intelligent multi-agent system that routes natural language queries to the appropriate data sources, synthesizes results, and delivers comprehensive answers.
Built on LangGraph, Databricks Genie and Lakebase, this solution enables business users to ask questions spanning multiple data domains without needing to understand the underlying data architecture or write complex SQL queries.
Why use DBX-UnifiedChat?
- Accuracy of Answer
- Validated with customers and partners, e.g., tumor outcome data analysis.
- Explanation and Curation
- Results are curated and explained by SQL answer returned and associated explanations.
- Speed
- Optimized with parallel/cache/token reduction/architecture design
- Achieves 1-2 seconds TTFT
- For complex query across domains, we see it achieves 1/3 to 1/2 of the time of the No/Low-Code custom agent solution.

The system uses a multi-agent architecture powered by LangGraph:
The system leverages:
See Architecture Documentation for detailed design.
π Click here to view the Interactive Presentation Slides
Please see the Development Guide for detailed instructions on the three supported workflows:
# Quick clone
git clone https://github.com/databricks-solutions/dbx-unifiedchat.git
cd dbx-unifiedchat
# See docs/DEVELOPMENT_GUIDE.md for next steps
Set up your environment variables in .env:
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=your-token
# Genie Configuration
GENIE_SPACE_IDS=space_id_1,space_id_2
# Vector Search Configuration
VECTOR_SEARCH_ENDPOINT=your-endpoint
VECTOR_SEARCH_INDEX=catalog.schema.index_name
# SQL Configuration
SQL_WAREHOUSE_ID=your-warehouse-id
# Test the agent with a sample query
python -m src.multi_agent.main --query "Show me patient demographics by region"
cd etl/
python local_dev_etl.py --all --sample-size 10
notebooks/test_agent_databricks.py in your Databricks workspacenotebooks/deploy_agent.py in your Databricks workspaceSee Deployment Guide for complete instructions.
.
βββ etl/ # ETL pipeline for metadata enrichment
β βββ local_dev_etl.py # Local ETL testing script
β βββ *.py # ETL notebooks for Databricks
βββ src/multi_agent/ # Core agent system
β βββ agents/ # Agent implementations
β βββ core/ # Graph, state, and configuration
β βββ tools/ # Agent tools and utilities
β βββ main.py # Entry point for local execution
βββ notebooks/ # Databricks notebooks
β βββ test_agent_databricks.py # Testing notebook
β βββ deploy_agent.py # Deployment notebook
βββ tests/ # Test suite
β βββ unit/ # Unit tests
β βββ integration/ # Integration tests
β βββ e2e/ # End-to-end tests
βββ docs/ # Documentation
β βββ ARCHITECTURE.md # System architecture
β βββ DEPLOYMENT.md # Deployment guide
β βββ LOCAL_DEVELOPMENT.md # Local development guide
β βββ CONFIGURATION.md # Configuration reference
βββ config/ # Configuration files
βββ dev_config.yaml # Development configuration
βββ prod_config.yaml # Production configuration
βββ pyproject.toml # Python package configuration
# Run all tests
pytest
# Run specific test suites
pytest tests/unit/ # Fast unit tests
pytest tests/integration/ # Integration tests with Databricks
pytest tests/e2e/ # End-to-end system tests
# Run with coverage
pytest --cov=src.multi_agent tests/
See Testing Guide for detailed testing documentation.
This repository supports three configuration modes:
| Configuration | Environment | Purpose |
|---|---|---|
.env + config.py |
Local development | Fast iteration with local Python |
dev_config.yaml |
Databricks notebooks | Testing with real Databricks services |
prod_config.yaml |
Model Serving | Production deployment configuration |
All three configurations use the same agent code from src/multi_agent/.
See Configuration Guide for details.
from src.multi_agent.main import run_agent
# Simple query
result = run_agent("What are the top 10 customers by revenue?")
# Cross-domain query
result = run_agent("Show me patient outcomes correlated with treatment protocols")
# Complex analytical query
result = run_agent("Compare sales performance across regions for the last quarter")
# Deploy to Databricks Model Serving
from databricks import agents
# Register agent as MLflow model
agents.deploy(
model_name="dbx-unifiedchat-agent",
model_version=1,
endpoint_name="unified-chat-endpoint"
)
| Component | Description |
|---|---|
| Multi-Agent System | LangGraph-based agent orchestration with specialized agents |
| Genie Integration | Native integration with Databricks Genie spaces |
| Vector Search | Semantic routing and metadata retrieval |
| ETL Pipeline | Metadata enrichment and index building |
| Deployment Tools | Notebooks and scripts for Databricks deployment |
| Test Suite | Comprehensive unit, integration, and E2E tests |
We welcome contributions! Please see CONTRIBUTING.md for:
For security vulnerabilities, please see our Security Policy.
The content provided here is for reference and educational purposes only. It is not officially supported by Databricks under any Service Level Agreements (SLAs). All materials are provided AS IS, without any guarantees or warranties, and are not intended for production use without proper review and testing.
The source code in this project is provided under the Databricks License. All third-party libraries included or referenced are subject to their respective licenses. See NOTICE.md for third-party license information.
If you encounter issues while using this content, please open a GitHub Issue in this repository. Issues will be reviewed as time permits, but there are no formal SLAs for support.
(c) 2026 Databricks, Inc. All rights reserved.
The source in this project is provided subject to the Databricks License. See LICENSE.md for details.
Third-Party Licenses: This project depends on various third-party packages. See NOTICE.md for complete attribution and license information.
Built with:
This repository is part of the Databricks Field Solutions collection - a curated set of real-world implementations, demonstrations, and technical content created by Databricks field engineers to share practical expertise and best practices.