Materialized Views

Applies To:	Pipeline Bundle
Configuration Scope:	Data Flow Spec
Databricks Docs:	Materialized Views Delta Live Tables Python Reference Delta Live Tables SQL Reference

Overview

Materialized Views are the precomputed results of a query stored in a Table. Please refer to the above documentation for full details on Materialized Views and how they work.

Key Features:

Automatic updates based on pipeline schedule/triggers
Guaranteed consistency with source data. All required data is processed, even if it arrives late or out of order.
Incremental refresh optimization. Databricks will try to choose the appropriate strategy that minimizes the cost of updating a materialized view.
Ideal for transformations and aggregations
Pre-computation of slow queries
Optimization for frequently used computations

Important

To support Incremental refresh, some keywords and clauses require row-tracking to be enabled on the queried data sources. Refer to the the following links for details on:

Incremental Refresh

Row Tracking

Sample Bundle

A sample is available in:

the bronze_sample bundle in the src/dataflows/feature_samples folder in the materialized_views_main.json|yaml file

the gold_sample bundle in the src/dataflows/base_samples folder in the materialized_views_main.json|yaml file

Data Flow Spec Configuration

Materialized Views are must be configured in the Materialized Views Data Flow Spec Type. This Data Flow Specification is defined in the Creating a Materialized View Data Flow Spec Reference documentation.

Data Flow Spec Configuration Schema

The following schema details the configuration for a Materialized View Data Flow Spec:

{
    "dataFlowId": "feature_materialized_views",
    "dataFlowGroup": "feature_samples",
    "dataFlowType": "materialized_view",
    "materializedViews": {
        "mv_name": {
            "sourceView": {
                "sourceViewName": "",
                "sourceType": "[delta|python|sql]",
                "sourceDetails": {}
            },
            "sqlPath": "",
            "sqlStatement": "",
            "tableDetails": {
                "database": "",
                "schemaPath": "",
                "tableProperties": {},
                "path": "",
                "partitionColumns": [],
                "clusterByColumns": []
            },
            "dataQualityExpectationsEnabled": false,
            "dataQualityExpectationsPath": "",
            "quarantineMode": "off",
            "quarantineTargetDetails": {}
        }
    }
}

dataFlowId: feature_materialized_views
dataFlowGroup: feature_samples
dataFlowType: materialized_view
materializedViews:
  mv_name:
    sourceView:
      sourceViewName: ''
      sourceType: '[delta|python|sql]'
      sourceDetails: {}
    sqlPath: ''
    sqlStatement: ''
    tableDetails:
      database: ''
      schemaPath: ''
      tableProperties: {}
      path: ''
      partitionColumns: []
      clusterByColumns: []
    dataQualityExpectationsEnabled: false
    dataQualityExpectationsPath: ''
    quarantineMode: 'off'
    quarantineTargetDetails: {}

Source Type Details

Materialized Views can be configured in your Data Flow Spec in three ways:

Using a Source View - Define a source view that the materialized view will be based on - Supports Delta, Python, and SQL source types - Example configuration:

{
    "sourceView": {
        "sourceViewName": "v_customer",
        "sourceType": "delta",
        "sourceDetails": {
            "database": "{staging_schema}",
            "table": "customer",
            "cdfEnabled": true
        }
    }
}

sourceView:
  sourceViewName: v_customer
  sourceType: delta
  sourceDetails:
    database: '{staging_schema}'
    table: customer
    cdfEnabled: true

Using SQL Path - Reference a SQL file containing the query for the materialized view - Example configuration:
{ "sqlPath": "./customer_mv.sql" }
sqlPath: ./customer_mv.sql

Using SQL Statement - Directly specify the SQL query for the materialized view - Example configuration:

{
    "sqlStatement": "SELECT * FROM {staging_schema}.customer"
}

sqlStatement: SELECT * FROM {staging_schema}.customer

Additional Configuration Options

Materialized Views support several additional configuration options:

Table Details: Configure the target table properties - Database - Schema path - Table properties - Path - Partition columns - Cluster by columns
Data Quality Expectations - Enable data quality checks - Specify expectations path - Configure quarantine mode (off, flag, table)
Quarantine Configuration - Set quarantine mode - Configure quarantine target details

Example Configuration

A complete example of a materialized view configuration:

{
    "dataFlowId": "feature_materialized_views",
    "dataFlowGroup": "feature_samples",
    "dataFlowType": "materialized_view",
    "materializedViews": {
        "mv_from_source_view": {
            "sourceView": {
                "sourceViewName": "v_mv_source_view",
                "sourceType": "delta",
                "sourceDetails": {
                    "database": "{staging_schema}",
                    "table": "customer",
                    "cdfEnabled": true
                }
            }
        },
        "mv_from_sql_path": {
            "sqlPath": "./mv_from_sql_path.sql"
        },
        "mv_from_sql_statement": {
            "sqlStatement": "SELECT * FROM {staging_schema}.customer"
        },
        "mv_with_quarantine": {
            "sqlStatement": "SELECT * FROM {staging_schema}.customer_address",
            "dataQualityExpectationsEnabled": true,
            "dataQualityExpectationsPath": "./customer_address_dqe.json",
            "quarantineMode": "table",
            "quarantineTargetDetails": {
                "targetFormat": "delta"
            }
        }
    }
}

dataFlowId: feature_materialized_views
dataFlowGroup: feature_samples
dataFlowType: materialized_view
materializedViews:
  mv_from_source_view:
    sourceView:
      sourceViewName: v_mv_source_view
      sourceType: delta
      sourceDetails:
        database: '{staging_schema}'
        table: customer
        cdfEnabled: true
  mv_from_sql_path:
    sqlPath: ./mv_from_sql_path.sql
  mv_from_sql_statement:
    sqlStatement: SELECT * FROM {staging_schema}.customer
  mv_with_quarantine:
    sqlStatement: SELECT * FROM {staging_schema}.customer_address
    dataQualityExpectationsEnabled: true
    dataQualityExpectationsPath: ./customer_address_dqe.json
    quarantineMode: table
    quarantineTargetDetails:
      targetFormat: delta

For more detailed information about configuration options, refer to the Data Flow Spec Reference documentation.