Creating a Materialized View Data Flow Spec Reference

A Materialized View Data Flow Spec is designed for creating and maintaining materialized views that aggregate or transform data from source tables.

Schema

The following schema details the configuration for a Materialized View Data Flow Spec:

{
    "dataFlowId": "feature_materialized_views",
    "dataFlowGroup": "feature_samples",
    "dataFlowType": "materialized_view",
    "materializedViews": {
        "mv_name": {
            "sourceView": {
                "sourceViewName": "",
                "sourceType": "[delta|python|sql]",
                "sourceDetails": {}
            },
            "sqlPath": "",
            "sqlStatement": "",
            "tableDetails": {
                "database": "",
                "schemaPath": "",
                "tableProperties": {},
                "path": "",
                "partitionColumns": [],
                "clusterByColumns": []
            },
            "dataQualityExpectationsEnabled": false,
            "dataQualityExpectationsPath": "",
            "quarantineMode": "off",
            "quarantineTargetDetails": {}
        }
    }
}

Example:

The below demonstrates a Materialized View Data Flow Spec:

{
    "dataFlowId": "feature_materialized_views",
    "dataFlowGroup": "feature_samples",
    "dataFlowType": "materialized_view",
    "materializedViews": {
        "mv_from_source_view": {
            "sourceView": {
                "sourceViewName": "v_mv_source_view",
                "sourceType": "delta",
                "sourceDetails": {
                    "database": "{staging_schema}",
                    "table": "customer",
                    "cdfEnabled": true
                }
            },
            "tableDetails": {
                "database": "{gold_schema}",
                "tableProperties": {
                    "delta.autoOptimize.optimizeWrite": "true",
                    "delta.autoOptimize.autoCompact": "true"
                },
                "clusterByColumns": ["year", "month"],
                "schemaPath": "schemas/customer_metrics_mv.json"
            },
        },
        "mv_from_sql_path": {
            "sqlPath": "./mv_from_sql_path.sql"
        },
        "mv_from_sql_statement": {
            "sqlStatement": "SELECT * FROM {staging_schema}.customer"
        },
        "mv_with_quarantine": {
            "sqlStatement": "SELECT * FROM {staging_schema}.customer_address",
            "dataQualityExpectationsEnabled": true,
            "dataQualityExpectationsPath": "./customer_address_dqe.json",
            "quarantineMode": "table",
            "quarantineTargetDetails": {
                "targetFormat": "delta"
            }
        }
    }
}

The above dataflow spec sample contains the following core components:

  • Dataflow metadata configuration

  • Source configuration

  • Table configuration

  • Data quality and quarantine settings

The following sections detail each of the above components.

Dataflow Metadata Configuration

These properties define the basic identity and type of the dataflow:

Field

Type

Description

dataFlowId

string

A unique identifier for the data flow.

dataFlowGroup

string

The group to which the data flow belongs, can be the same as dataFlowId if there is no group.

dataFlowType

string

The type of data flow. Must be materialized_view for materialized view dataflows.

Source Configuration

These properties define the source of the data:

Field

Type

Description

sourceSystem (optional)

string

The source system name. Value is not used to determine or change any behaviour.

sourceType

string

The type of source. Supported: cloudFiles, delta, sql, python

sourceViewName

string

The name to assign the source view. String Pattern: v_([A-Za-z0-9_]+)

sourceDetails

object

See Data Flow Spec - Source Details for more information.

Table Configuration

These properties define the materialized view specific configuration:

Field

Type

Description

tableDetails

object

Configuration specific to materialized views.

database

string

The schema to write the materialized view to.

schemaPath

string

The path to the schema file for the materialized view.

tableProperties

object

The table properties to set on the materialized view.

path

string

A storage location for table data. If not set, use the managed storage location for the schema containing the table.

partitionColumns

array

The columns to partition the materialized view by.

clusterByColumns

array

The suggested columns to cluster the materialized view by.

comment

string

A description for the materialized view.

spark_conf (optional)

object

A list of Spark configurations for the execution of this query.

private (optional)

boolean

Create a table, but do not publish the table to the metastore.

Data Quality and Quarantine Configuration

These properties control how data quality issues are handled:

Field

Type

Description

dataQualityExpectationsEnabled (optional)

boolean

A flag indicating whether data quality expectations are enabled (see Data Quality - Quarantine).

dataQualityExpectationsPath (optional)

string

Either a relative path or filename for the expectations file. Note that the framework automatically calculates all relative paths from the appropriate expectations sub-folder, in the Pipeline Bundle. Examples:

  • All expectations files in the expectations sub-folder: . or *

  • A specific expectations file: my_table_dqe.json

quarantineMode (optional)

string

The mode for handling quarantined data. It can be off, flag, or table. Supported: [“off”, “flag”, “table”]

quarantineTargetDetails (optional)

object

Details about the quarantine target, only required if quarantineMode is set to table. See quarantineTargetDetails section below.

quarantineTargetDetails

The quarantineTargetDetails object contains the following properties:

Parameter

Type

Description

targetFormat

string

The format of the quarantine target. Currently, only delta is supported.

Supported: ["delta"]
Default: "delta"

table

string

(conditional) The table name, required if targetFormat is delta.

tableProperties

object

(conditional) Additional properties for the table, required if targetFormat is delta.

path

string

(conditional) The path to the table, required if targetFormat is delta.