DABs Definition

You'll deploy a Lakeflow Connect ingestion pipeline with a classic compute gateway using DABs in ~15 min.

Prereqs: Databases and SaaS ingestion overview, DABs CLI

What you'll build

A Lakeflow Connect ingestion pipeline defined as code with Databricks Asset Bundles (DABs). The definition has two parts: a gateway pipeline on classic compute, so it can reach the source over the network, and a serverless ingestion pipeline that writes into UC tables.

Prerequisites

A Unity Catalog connection to the source database (see Manage connections).
Databricks CLI installed and configured.
A service principal or user with permissions to create pipelines in the target workspace.
Check feature availability for your data source.

Steps

1. Define the pipeline in your DABs YAML

The example below comes from Create pipeline for PostgreSQL. The one thing it adds over the default is the explicit classic compute block on the gateway pipeline; that is the whole reason to go the DABs route.

For other connectors (MySQL, SQL Server, Salesforce, and more), see the Databricks docs left panel.

Add this to your DABs resources block:

variables:
  gateway_name:
    default: postgresql_gateway_pipeline
  pipeline_name:
    default: postgresql_pipeline
  dest_catalog:
    default: development
  dest_schema:
    default: c360_source

resources:
  pipelines:
    gateway:
      name: ${var.gateway_name}
      gateway_definition:
        connection_name: <my-connection>
        gateway_storage_catalog: development
        gateway_storage_schema: ${var.dest_schema}
        gateway_storage_name: ${var.gateway_name}
      target: ${var.dest_schema}
      catalog: ${var.dest_catalog}

      clusters:
        - label: default
          # AWS instance types. For Azure use Standard_DS3_v2 / Standard_DS4_v2
          driver_node_type_id: r5.xlarge
          # AWS instance types. For Azure use Standard_E8ds_v4 / Standard_E16ds_v4
          node_type_id: m5.xlarge
          autoscale:
            min_workers: 2
            max_workers: 4
            mode: ENHANCED

    pipeline_postgresql:
      name: ${var.pipeline_name}
      ingestion_definition:
        ingestion_gateway_id: ${resources.pipelines.gateway.id}

        source_type: POSTGRESQL
        objects:
          - table:
              source_catalog: your_database
              source_schema: public
              source_table: orders
              destination_catalog: ${var.dest_catalog}
              destination_schema: ${var.dest_schema}
          - schema:
              source_catalog: your_database
              source_schema: public
              destination_catalog: ${var.dest_catalog}
              destination_schema: ${var.dest_schema}
        source_configurations:
          - catalog:
              source_catalog: your_database
              postgres:
                slot_config:
                  slot_name: db_slot
                  publication_name: db_pub
      target: ${var.dest_schema}
      catalog: ${var.dest_catalog}

danger

The provider must use a workspace-level connection. The service principal deploying the pipeline needs workspace-admin access. See REST API: Create a pipeline for attribute definitions.

2. Replace placeholders

Replace <my-connection> with the name of your UC connection.
Replace your_database, public, and orders with your actual source database, schema, and table names.
Adjust dest_catalog and dest_schema to match your governance model.
For Azure, swap the instance types to the Azure equivalents noted in the comments.

3. Deploy the pipeline

databricks bundle deploy

4. Start the pipeline

databricks bundle run pipeline_postgresql

Verify

In the Databricks workspace, navigate to Workflows > Delta Live Tables.
Confirm both the gateway pipeline and the ingestion pipeline show a Running or Completed status.
Navigate to Catalog > the target catalog and schema. Confirm the ingested tables appear with data.

Where people trip

Gateway pipeline fails to start

The gateway runs on classic compute in your VPC. Check the cluster config (instance types, autoscale settings) and confirm the workspace is allowed to launch clusters with those instance types.

Ingestion pipeline cannot connect to the source

The gateway cluster needs network access to the source database. Check VPC peering, security group rules, and firewall settings, and confirm the connection object still has valid credentials.

Permission error when creating the pipeline

Whoever runs databricks bundle deploy, user or service principal, needs workspace-admin access or explicit permission to create pipelines. Check the workspace admin settings.

Do next: Build the first pipeline
Learn why: Unity Catalog foundations
Reference: Lakeflow Connect (Databricks docs)

What you'll build​

Prerequisites​

Steps​

1. Define the pipeline in your DABs YAML​

2. Replace placeholders​

3. Deploy the pipeline​

4. Start the pipeline​

Verify​

Where people trip​

Next​