DABs Definition
You'll deploy a Lakeflow Connect ingestion pipeline with a classic compute gateway using DABs in ~15 min.
What you'll build
A Lakeflow Connect ingestion pipeline defined as code with Databricks Asset Bundles (DABs). The definition has two parts: a gateway pipeline on classic compute, so it can reach the source over the network, and a serverless ingestion pipeline that writes into UC tables.
Prerequisites
- A Unity Catalog connection to the source database (see Manage connections).
- Databricks CLI installed and configured.
- A service principal or user with permissions to create pipelines in the target workspace.
- Check feature availability for your data source.
Steps
1. Define the pipeline in your DABs YAML
The example below comes from Create pipeline for PostgreSQL. The one thing it adds over the default is the explicit classic compute block on the gateway pipeline; that is the whole reason to go the DABs route.
For other connectors (MySQL, SQL Server, Salesforce, and more), see the Databricks docs left panel.
Add this to your DABs resources block:
variables:
gateway_name:
default: postgresql_gateway_pipeline
pipeline_name:
default: postgresql_pipeline
dest_catalog:
default: development
dest_schema:
default: c360_source
resources:
pipelines:
gateway:
name: ${var.gateway_name}
gateway_definition:
connection_name: <my-connection>
gateway_storage_catalog: development
gateway_storage_schema: ${var.dest_schema}
gateway_storage_name: ${var.gateway_name}
target: ${var.dest_schema}
catalog: ${var.dest_catalog}
clusters:
- label: default
# AWS instance types. For Azure use Standard_DS3_v2 / Standard_DS4_v2
driver_node_type_id: r5.xlarge
# AWS instance types. For Azure use Standard_E8ds_v4 / Standard_E16ds_v4
node_type_id: m5.xlarge
autoscale:
min_workers: 2
max_workers: 4
mode: ENHANCED
pipeline_postgresql:
name: ${var.pipeline_name}
ingestion_definition:
ingestion_gateway_id: ${resources.pipelines.gateway.id}
source_type: POSTGRESQL
objects:
- table:
source_catalog: your_database
source_schema: public
source_table: orders
destination_catalog: ${var.dest_catalog}
destination_schema: ${var.dest_schema}
- schema:
source_catalog: your_database
source_schema: public
destination_catalog: ${var.dest_catalog}
destination_schema: ${var.dest_schema}
source_configurations:
- catalog:
source_catalog: your_database
postgres:
slot_config:
slot_name: db_slot
publication_name: db_pub
target: ${var.dest_schema}
catalog: ${var.dest_catalog}
The provider must use a workspace-level connection. The service principal deploying the pipeline needs workspace-admin access. See REST API: Create a pipeline for attribute definitions.
2. Replace placeholders
- Replace
<my-connection>with the name of your UC connection. - Replace
your_database,public, andorderswith your actual source database, schema, and table names. - Adjust
dest_cataloganddest_schemato match your governance model. - For Azure, swap the instance types to the Azure equivalents noted in the comments.
3. Deploy the pipeline
databricks bundle deploy
4. Start the pipeline
databricks bundle run pipeline_postgresql
Verify
- In the Databricks workspace, navigate to Workflows > Delta Live Tables.
- Confirm both the gateway pipeline and the ingestion pipeline show a Running or Completed status.
- Navigate to Catalog > the target catalog and schema. Confirm the ingested tables appear with data.
Where people trip
Gateway pipeline fails to start
The gateway runs on classic compute in your VPC. Check the cluster config (instance types, autoscale settings) and confirm the workspace is allowed to launch clusters with those instance types.
Ingestion pipeline cannot connect to the source
The gateway cluster needs network access to the source database. Check VPC peering, security group rules, and firewall settings, and confirm the connection object still has valid credentials.
Permission error when creating the pipeline
Whoever runs databricks bundle deploy, user or service principal, needs workspace-admin access or explicit permission to create pipelines. Check the workspace admin settings.
Next
- Do next: Build the first pipeline
- Learn why: Unity Catalog foundations
- Reference: Lakeflow Connect (Databricks docs)