Multi-Source Streaming
Applies To: |
Pipeline Bundle |
Configuration Scope: |
Data Flow Spec |
Databricks Docs: |
Delta Live Tables supports processing that requires reading data from multiple streaming sources to update a single streaming table via:
Append Flows - Append streams from multiple sources to a single streaming table.
Change Flows - Process CDC events from multiple sources to a single streaming table, using the CDC API’s.
The Lakeflow Framework implements this capability via the Data Flow Spec using the concept of flow groups and flows.
Configuration
In a Pipeline Bundle bundle, multi-source streaming is configured in the Data Flow Spec using the flow_groups and flows attributes.
This is documented in _flow-group-configuration and _flow-configuration.
Key Features
Write to a single streaming table from multiple source streams
Add or Remove streaming sources without requiring a full table refresh
Support for backfilling historical data
Alternative to UNION operations for combining multiple sources
Maintain separate checkpoints for each flow
Important Considerations
Flow names are used to identify streaming checkpoints
Renaming an existing flow creates a new checkpoint
Flow names must be unique within a pipeline
Data quality expectations should be defined on the target table, not in flow definitions
Append flows provide more efficient processing compared to UNION operations for combining multiple sources
Append SQL flows do not support quarantine table mode (they do support quarantine flag mode). This is because quarantine table mode requires a source view.
See Also
feature_source_target_types