Lakeflow Framework documentation
The Lakeflow Framework is a metadata-driven data engineering framework built for Databricks. It accelerates and simplifies the deployment of Spark Declarative Pipelines (SDP) while supporting your entire software development lifecycle.
Key Capabilities:
Build robust data pipelines using a configuration-driven, Lego-block approach
Support batch and streaming workloads across the medallion architecture (Bronze, Silver, Gold)
Deploy seamlessly with Databricks Asset Bundles (DABS)—no wheel files or control tables required
Extend and maintain easily as your data platform evolves
This documentation covers everything from getting started to advanced orchestration patterns. Explore the sections below to begin building reliable, maintainable data pipelines.
Contents:
- Introduction
- Getting Started
- Concepts
- Features
- Auto Complete / Intellisense
- Builder Parallelization
- Change Data Capture (CDC)
- Change Data Feed (CDF)
- Data Quality - Expectations
- Data Quality - Quarantine
- Direct Publishing Mode
- Liquid Clustering
- Logging
- Logical Environments
- Materialized Views
- Mandatory Table Properties
- Multi-Source Streaming
- Operational Metadata
- Python Dependency Management
- Python Extensions
- Python Function Transforms
- Schemas
- Data Flow Specification Format
- Secrets Management
- Soft Deletes
- Source Types
- Spark Configuration
- Substitutions
- Table Migration
- Target Types
- Templates
- Validation
- Versioning - DataFlow Specs
- Versioning - Framework
- UI Integration
- Deploy the Framework
- The Samples
- Build and Deploy Pipelines
- Bundle Scope and Structure
- Building a Pipeline Bundle
- Deploying a Pipeline Bundle
- Pipeline Execution
- Patterns: Data Flows and Pipelines
- Data Flow Spec Reference
- Creating a Standard Data Flow Spec Reference
- Creating a Flows Data Flow Spec Reference
- Creating a Materialized View Data Flow Spec Reference
- Orchestration
- Framework Development & Contributors