Materialized Views
Applies To: |
Pipeline Bundle |
Configuration Scope: |
Data Flow Spec |
Databricks Docs: |
Overview
Materialized Views are the precomputed results of a query stored in a Table. Please refer to the above documentation for full details on Materialized Views and how they work.
Key Features:
Automatic updates based on pipeline schedule/triggers
Guaranteed consistency with source data. All required data is processed, even if it arrives late or out of order.
Incremental refresh optimization. Databricks will try to choose the appropriate strategy that minimizes the cost of updating a materialized view.
Ideal for transformations and aggregations
Pre-computation of slow queries
Optimization for frequently used computations
Important
To support Incremental refresh, some keywords and clauses require row-tracking to be enabled on the queried data sources. Refer to the the following links for details on:
Sample Bundle
A sample is available in:
the
bronze_samplebundle in thesrc/dataflows/feature_samplesfolder in thematerialized_views_main.json|yamlfilethe
gold_samplebundle in thesrc/dataflows/base_samplesfolder in thematerialized_views_main.json|yamlfile
Data Flow Spec Configuration
Materialized Views are must be configured in the Materialized Views Data Flow Spec Type. This Data Flow Specification is defined in the Creating a Materialized View Data Flow Spec Reference documentation.
Data Flow Spec Configuration Schema
The following schema details the configuration for a Materialized View Data Flow Spec:
{
"dataFlowId": "feature_materialized_views",
"dataFlowGroup": "feature_samples",
"dataFlowType": "materialized_view",
"materializedViews": {
"mv_name": {
"sourceView": {
"sourceViewName": "",
"sourceType": "[delta|python|sql]",
"sourceDetails": {}
},
"sqlPath": "",
"sqlStatement": "",
"tableDetails": {
"database": "",
"schemaPath": "",
"tableProperties": {},
"path": "",
"partitionColumns": [],
"clusterByColumns": []
},
"dataQualityExpectationsEnabled": false,
"dataQualityExpectationsPath": "",
"quarantineMode": "off",
"quarantineTargetDetails": {}
}
}
}
dataFlowId: feature_materialized_views
dataFlowGroup: feature_samples
dataFlowType: materialized_view
materializedViews:
mv_name:
sourceView:
sourceViewName: ''
sourceType: '[delta|python|sql]'
sourceDetails: {}
sqlPath: ''
sqlStatement: ''
tableDetails:
database: ''
schemaPath: ''
tableProperties: {}
path: ''
partitionColumns: []
clusterByColumns: []
dataQualityExpectationsEnabled: false
dataQualityExpectationsPath: ''
quarantineMode: 'off'
quarantineTargetDetails: {}
Source Type Details
Materialized Views can be configured in your Data Flow Spec in three ways:
Using a Source View - Define a source view that the materialized view will be based on - Supports Delta, Python, and SQL source types - Example configuration:
{ "sourceView": { "sourceViewName": "v_customer", "sourceType": "delta", "sourceDetails": { "database": "{staging_schema}", "table": "customer", "cdfEnabled": true } } }
sourceView: sourceViewName: v_customer sourceType: delta sourceDetails: database: '{staging_schema}' table: customer cdfEnabled: true
Using SQL Path - Reference a SQL file containing the query for the materialized view - Example configuration:
{ "sqlPath": "./customer_mv.sql" }
sqlPath: ./customer_mv.sql
Using SQL Statement - Directly specify the SQL query for the materialized view - Example configuration:
{ "sqlStatement": "SELECT * FROM {staging_schema}.customer" }
sqlStatement: SELECT * FROM {staging_schema}.customer
Additional Configuration Options
Materialized Views support several additional configuration options:
Table Details: Configure the target table properties - Database - Schema path - Table properties - Path - Partition columns - Cluster by columns
Data Quality Expectations - Enable data quality checks - Specify expectations path - Configure quarantine mode (off, flag, table)
Quarantine Configuration - Set quarantine mode - Configure quarantine target details
Example Configuration
A complete example of a materialized view configuration:
{
"dataFlowId": "feature_materialized_views",
"dataFlowGroup": "feature_samples",
"dataFlowType": "materialized_view",
"materializedViews": {
"mv_from_source_view": {
"sourceView": {
"sourceViewName": "v_mv_source_view",
"sourceType": "delta",
"sourceDetails": {
"database": "{staging_schema}",
"table": "customer",
"cdfEnabled": true
}
}
},
"mv_from_sql_path": {
"sqlPath": "./mv_from_sql_path.sql"
},
"mv_from_sql_statement": {
"sqlStatement": "SELECT * FROM {staging_schema}.customer"
},
"mv_with_quarantine": {
"sqlStatement": "SELECT * FROM {staging_schema}.customer_address",
"dataQualityExpectationsEnabled": true,
"dataQualityExpectationsPath": "./customer_address_dqe.json",
"quarantineMode": "table",
"quarantineTargetDetails": {
"targetFormat": "delta"
}
}
}
}
dataFlowId: feature_materialized_views
dataFlowGroup: feature_samples
dataFlowType: materialized_view
materializedViews:
mv_from_source_view:
sourceView:
sourceViewName: v_mv_source_view
sourceType: delta
sourceDetails:
database: '{staging_schema}'
table: customer
cdfEnabled: true
mv_from_sql_path:
sqlPath: ./mv_from_sql_path.sql
mv_from_sql_statement:
sqlStatement: SELECT * FROM {staging_schema}.customer
mv_with_quarantine:
sqlStatement: SELECT * FROM {staging_schema}.customer_address
dataQualityExpectationsEnabled: true
dataQualityExpectationsPath: ./customer_address_dqe.json
quarantineMode: table
quarantineTargetDetails:
targetFormat: delta
For more detailed information about configuration options, refer to the Data Flow Spec Reference documentation.