Data Quality - Expectations
Applies To: |
Pipeline Bundle |
Configuration Scope: |
Data Flow Spec |
Databricks Docs: |
https://docs.databricks.com/en/delta-live-tables/expectations.html |
Define expectations in your Data Flow Spec to apply quality constraints that validate data as it flows through ETL pipelines.
Configuration
Defining Expectations
In a Pipeline Bundle bundle, expectations are defined at a table level and must be located in an expectations sub-folder of the directory containing the corresponding Data Flow Spec. Examples:
flat structure:
src/dataflows/expectations/<table_name>_dqe.jsonorganized by target table:
src/dataflows/<table>/expectations/<table_name>_dqe.json
Note
The expectations file name can have any name (with a .json extension), but best practice is to use the pattern <table_name>_dqe.json.
The schema for the expectations file is defined below, in the Expectations Schema section.
Enabling and Referencing Expectations
Once the expectations are defined, they can be enabled and referenced in the Data Flow Spec, per the following sections of this documentation:
Expectations Schema
{
"<expectation_type>": [
{
"name": "<expectation name>",
"contraint": "SQL constraint",
"tag": "<tag>",
"enabled": "<true or false>"
}
]
}
Field |
Type |
Description |
|---|---|---|
type of expectation |
|
The type of expectation. Valid types are:
|
name |
|
The name of the expectation, any unique name can be given. |
constraint |
|
The SQL constraint that defines the expectation. The constraint should be a valid SQL query that returns a boolean value. If the constraint returns true, the record is considered valid; else, it is considered invalid. |
tag (optional) |
|
The tag is used to group expectations together. |
enabled (optional) |
|
Specifies if the expectation is enabled or not. If the expectation is enabled, it will be validated; otherwise, it will be ignored. If not specified, the expectation will be enabled by default. |
Examples
{
"expect": [
{
"name": "expectation_1",
"constraint": "column_1 > 0",
"tag": "tag_1",
"enabled": true
}
],
"expect_or_drop": [
{
"name": "expectation_2",
"constraint": "column_2 < 100",
"tag": "tag_2",
"enabled": true
}
],
"expect_or_fail": [
{
"name": "expectation_3",
"constraint": "column_3 != 'NULL'",
"tag": "tag_3",
"enabled": true
}
]
}