Change Data Feed (CDF)
Applies To: |
Pipeline Bundle |
Configuration Scope: |
Data Flow Spec |
Databricks Docs: |
Overview
Change Data Feed (CDF) is a Delta Lake feature that enables tracking of row-level changes between versions of a Delta table. The framework provides built-in support for CDF to help track and process data changes efficiently.
Configuration
Enabling CDF on a Table
To enable CDF on a target table or staging table, you need to add the delta.enableChangeDataFeed property to the tableProperties object of the targetDetails object in your Data Flow Spec and set it to true. For example:
{
"targetFormat": "delta",
"targetDetails": {
"table": "my_table",
"tableProperties": {
"delta.enableChangeDataFeed": "true"
},
"schemaPath": "customer_schema.json"
}
}
targetFormat: delta
targetDetails:
table: my_table
tableProperties:
delta.enableChangeDataFeed: 'true'
schemaPath: customer_schema.json
Reading From CDF in a View
To read from CDF, you need to do so via a view. When specifying a view in your Data Flow Spec, set the cdfEnabled attribute to true. There are different types of dataflow specs and ways to specify a view, refer to the Data Flow Spec Reference documentation for more information.
Standard Dataflow Spec example:
{
"sourceViewName": "v_customer_address",
"sourceDetails": {
"database": "{bronze_schema}",
"table": "customer_address",
"cdfEnabled": true
}
}
sourceViewName: v_customer_address
sourceDetails:
database: '{bronze_schema}'
table: customer_address
cdfEnabled: true
Flows Dataflow Spec example:
{
"views": {
"v_customer": {
"mode": "stream",
"sourceType": "delta",
"sourceDetails": {
"database": "{bronze_schema}",
"table": "customer",
"cdfEnabled": true
}
}
}
}
views:
v_customer:
mode: stream
sourceType: delta
sourceDetails:
database: '{bronze_schema}'
table: customer
cdfEnabled: true
Important Considerations:
Refer to the Databricks documentation for information on:
Concepts
Schema / CDF columns
Change types
Limitations