Databases and SaaS Ingestion
You'll understand how Lakeflow Connect ingestion pipelines work and choose a deployment method in ~5 min.
Prereqs: Access your data
Why this matters
A connection lets Databricks reach an external system. An ingestion pipeline puts it to work: it replicates data continuously, change data capture included, into Unity Catalog tables. No pipeline, and the connection just sits there while your data stays in the source.
How it works
A Lakeflow Connect ingestion pipeline is two pieces, and the split matters because they run in different places:
- The ingestion gateway pulls data from the source and stages it in cloud storage. It runs on classic compute inside your VPC, because that is what can reach the source over the network.
- The ingestion pipeline reads the staged data and writes it into UC tables. It can run serverless, since it only touches UC default storage.
See Managed connector in Lakeflow Connect for details on which component runs where.
Video tutorials
Lakeflow Connect overview
SQL Server change data capture (CDC)
Salesforce
SharePoint
ServiceNow
Deploy with DABs
Reach for this when the gateway needs classic compute, or when you want the pipeline version-controlled and repeatable instead of clicked together in the UI:
- DABs definition: define and deploy an ingestion pipeline with Databricks Asset Bundles.
Next
- Do next: DABs definition
- Learn why: Unity Catalog foundations
- Reference: Lakeflow Connect (Databricks docs)