Databases and SaaS Ingestion

You'll understand how Lakeflow Connect ingestion pipelines work and choose a deployment method in ~5 min.

Prereqs: Access your data

Why this matters

A connection lets Databricks reach an external system. An ingestion pipeline puts it to work: it replicates data continuously, change data capture included, into Unity Catalog tables. No pipeline, and the connection just sits there while your data stays in the source.

How it works

A Lakeflow Connect ingestion pipeline is two pieces, and the split matters because they run in different places:

The ingestion gateway pulls data from the source and stages it in cloud storage. It runs on classic compute inside your VPC, because that is what can reach the source over the network.
The ingestion pipeline reads the staged data and writes it into UC tables. It can run serverless, since it only touches UC default storage.

See Managed connector in Lakeflow Connect for details on which component runs where.

Video tutorials

Lakeflow Connect overview

SQL Server change data capture (CDC)

Salesforce

SharePoint

ServiceNow

Deploy with DABs

Reach for this when the gateway needs classic compute, or when you want the pipeline version-controlled and repeatable instead of clicked together in the UI:

DABs definition: define and deploy an ingestion pipeline with Databricks Asset Bundles.

Do next: DABs definition
Learn why: Unity Catalog foundations
Reference: Lakeflow Connect (Databricks docs)

Why this matters​

How it works​

Video tutorials​

Lakeflow Connect overview​

SQL Server change data capture (CDC)​

Salesforce​

SharePoint​

ServiceNow​

Deploy with DABs​

Next​