Skip to main content

DABs Definition

You'll deploy a Lakeflow Connect ingestion pipeline with a classic compute gateway using DABs in ~15 min.

Prereqs: Create ingestion pipeline overview, Create connection, DABs CLI

What you'll build

A Lakeflow Connect ingestion pipeline defined as code using Databricks Asset Bundles (DABs). The definition includes a gateway pipeline on classic compute (for network access to the source system) and a serverless ingestion pipeline that writes into UC tables.

Prerequisites

  • A Unity Catalog connection to the source database (see Create connection).
  • Databricks CLI installed and configured.
  • A service principal or user with permissions to create pipelines in the target workspace.
  • Check feature availability for your data source.

Steps

1. Define the pipeline in your DABs YAML

The example below is based on Create pipeline for PostgreSQL. The key difference from the default setup is the explicit classic compute specification on the gateway pipeline.

Check the Databricks documentation left panel for other connectors (MySQL, SQL Server, Salesforce, etc.).

Add this to your DABs resources block:

variables:
gateway_name:
default: postgresql_gateway_pipeline
pipeline_name:
default: postgresql_pipeline
dest_catalog:
default: development
dest_schema:
default: c360_source

resources:
pipelines:
gateway:
name: ${var.gateway_name}
gateway_definition:
connection_name: <my-connection>
gateway_storage_catalog: development
gateway_storage_schema: ${var.dest_schema}
gateway_storage_name: ${var.gateway_name}
target: ${var.dest_schema}
catalog: ${var.dest_catalog}

clusters:
- label: default
# AWS instance types — for Azure use Standard_DS3_v2 / Standard_DS4_v2
driver_node_type_id: r5.xlarge
# AWS instance types — for Azure use Standard_E8ds_v4 / Standard_E16ds_v4
node_type_id: m5.xlarge
autoscale:
min_workers: 2
max_workers: 4
mode: ENHANCED

pipeline_postgresql:
name: ${var.pipeline_name}
ingestion_definition:
ingestion_gateway_id: ${resources.pipelines.gateway.id}

source_type: POSTGRESQL
objects:
- table:
source_catalog: your_database
source_schema: public
source_table: orders
destination_catalog: ${var.dest_catalog}
destination_schema: ${var.dest_schema}
- schema:
source_catalog: your_database
source_schema: public
destination_catalog: ${var.dest_catalog}
destination_schema: ${var.dest_schema}
source_configurations:
- catalog:
source_catalog: your_database
postgres:
slot_config:
slot_name: db_slot
publication_name: db_pub
target: ${var.dest_schema}
catalog: ${var.dest_catalog}
danger

The provider must use a workspace-level connection. The service principal deploying the pipeline needs workspace-admin access. See REST API — Create a pipeline for attribute definitions.

2. Replace placeholders

  • Replace <my-connection> with the name of your UC connection.
  • Replace your_database, public, and orders with your actual source database, schema, and table names.
  • Adjust dest_catalog and dest_schema to match your governance model.
  • For Azure, swap the instance types to the Azure equivalents noted in the comments.

3. Deploy the pipeline

databricks bundle deploy

4. Start the pipeline

databricks bundle run pipeline_postgresql

Verify

  1. In the Databricks workspace, navigate to Workflows > Delta Live Tables.
  2. Confirm both the gateway pipeline and the ingestion pipeline show a Running or Completed status.
  3. Navigate to Catalog > the target catalog and schema. Confirm the ingested tables appear with data.

Troubleshoot

Gateway pipeline fails to start

The gateway runs on classic compute in your VPC. Verify the cluster configuration (instance types, autoscale settings) and that the workspace has permission to launch clusters with those instance types.

Ingestion pipeline cannot connect to the source

The gateway cluster must have network access to the source database. Verify VPC peering, security group rules, and firewall settings. The connection object must also have valid credentials.

Permission error when creating the pipeline

The service principal or user running databricks bundle deploy needs workspace-admin access or explicit permission to create pipelines. Verify permissions in the workspace admin settings.

Next