Validation

Applies To:

Pipeline Bundle

Configuration Scope:

Pipeline Bundle

Databricks Docs:

NA

Overview

The framework uses the Python jsonschema library to define the schema and validation rules for the:

  • Data Flow Specifications

  • Expectations

  • Secrets Configurations

This provides the following functionality:

How Validation Works

The framework uses the jsonschema library to validate the Data Flow Specifications, Expectations, and Secrets Configurations. Essentially each time a pipeline executes the following steps are performed:

Step

Name

Description

1

Load and Initialize Framework

Load and initialize the Framework

2

Retrieve Data Flow Specifications

  1. Retrieve and validate: - Read and validate ALL the Data Flow Specifications, Expectations, and Secrets Configurations from the workspace files location of the Pipeline Bundle. - If a file is not valid it will be added to an error list. - If any files failed validation, the pipeline will fail and the user will receive a list of validation errors.

  2. Apply pipeline filters: - The framework will apply any pipeline filters to the in memory dictionary. - The only exception to this is the File Filter which means the framework will specifically only read that file(s).

3

Generate Pipeline Definition

The Framework will then use the in memory dictionary to initialize the Spark Declarative Pipeline.

4

Execute Pipeline

The pipeline will then execute the logic defined in the Data Flow Specifications.

Ignoring Validation Errors

Ignoring validation errors can be useful when iterating in Dev or SIT environments and you want to focus on specific Data Flow Specs (selected by your pipeline filters), without being blocked by validation errors.

You can ignore validation errors by setting the pipeline.ignoreValidationErrors configuration to True.

You can do this in the pipeline resource YAML file or via the Databricks UI in the Spark Declarative Pipeline Settings.

resources:
    pipelines:
        dlt_framework_samples_bronze_base_pipeline:
        name: Lakeflow Framework Samples - Bronze - Base Pipeline (${var.logical_env})
        channel: CURRENT
        serverless: true
        catalog: ${var.catalog}
        schema: ${var.schema}
        libraries:
            - notebook:
                path: ${var.framework_source_path}/dlt_pipeline

        configuration:
            bundle.sourcePath: ${workspace.file_path}/src
            bundle.target: ${bundle.target}
            framework.sourcePath: ${var.framework_source_path}
            workspace.host: ${var.workspace_host}
            pipeline.layer: ${var.layer}
            logicalEnv: ${var.logical_env}
            pipeline.dataFlowGroupFilter: base_samples
            pipeline.ignoreValidationErrors: True