Supported Source Types
Applies To: |
Pipeline Bundle |
Configuration Scope: |
Data Flow Spec |
Databricks Docs: |
https://docs.databricks.com/en/delta-live-tables/python-ref.html |
The Lakeflow Framework supports multiple source types. Each source type provides specific configuration options to handle different data ingestion scenarios.
Source Types
Type |
Description |
Key Features |
|---|---|---|
Batch Files |
Reads data from UC Volumes orcloud storage locations (e.g., S3, ADLS, GCS). Supports various file formats and provides options for filtering and transforming data during ingestion. |
|
Cloud Files |
Reads data from UC Volumes or cloud storage locations (e.g., S3, ADLS, GCS). Supports various file formats and provides options for filtering and transforming data during ingestion. |
|
Delta |
Connects to existing Delta tables in the metastore, supporting both batch and streaming reads with change data feed (CDF) capabilities. |
|
Delta Join |
Enables joining multiple Delta tables, supporting both streaming and static join patterns. |
|
Kafka |
Enables reading from Apache Kafka topics for real-time streaming data processing. |
|
Python |
Allows using a Python function as a data source, providing flexibility for complex data transformations. |
|
SQL |
Allows using SQL queries as data sources, providing flexibility for complex data transformations. |
|
General Data Flow Spec Configuration
Set as an attribute when creating your Data Flow Spec, refer to the Data Flow Spec Reference documentation for more information: