Validation
Applies To: |
Pipeline Bundle |
Configuration Scope: |
Pipeline Bundle |
Databricks Docs: |
NA |
Overview
The framework uses the Python jsonschema library to define the schema and validation rules for the:
Data Flow Specifications
Expectations
Secrets Configurations
This provides the following functionality:
Validation in your CI/CD pipelines
Validation at Spark Declarative Pipeline initialization time
How Validation Works
The framework uses the jsonschema library to validate the Data Flow Specifications, Expectations, and Secrets Configurations.
Essentially each time a pipeline executes the following steps are performed:
Step |
Name |
Description |
|---|---|---|
1 |
Load and Initialize Framework |
Load and initialize the Framework |
2 |
Retrieve Data Flow Specifications |
|
3 |
Generate Pipeline Definition |
The Framework will then use the in memory dictionary to initialize the Spark Declarative Pipeline. |
4 |
Execute Pipeline |
The pipeline will then execute the logic defined in the Data Flow Specifications. |
Ignoring Validation Errors
Ignoring validation errors can be useful when iterating in Dev or SIT environments and you want to focus on specific Data Flow Specs (selected by your pipeline filters), without being blocked by validation errors.
You can ignore validation errors by setting the pipeline.ignoreValidationErrors configuration to True.
You can do this in the pipeline resource YAML file or via the Databricks UI in the Spark Declarative Pipeline Settings.
resources:
pipelines:
dlt_framework_samples_bronze_base_pipeline:
name: Lakeflow Framework Samples - Bronze - Base Pipeline (${var.logical_env})
channel: CURRENT
serverless: true
catalog: ${var.catalog}
schema: ${var.schema}
libraries:
- notebook:
path: ${var.framework_source_path}/dlt_pipeline
configuration:
bundle.sourcePath: ${workspace.file_path}/src
bundle.target: ${bundle.target}
framework.sourcePath: ${var.framework_source_path}
workspace.host: ${var.workspace_host}
pipeline.layer: ${var.layer}
logicalEnv: ${var.logical_env}
pipeline.dataFlowGroupFilter: base_samples
pipeline.ignoreValidationErrors: True