Substitutions

Applies To:

Framework Bundle Pipeline Bundle

Configuration Scope:

Global Pipeline

Databricks Docs:

NA

When deploying pipeline bundles to different environments (dev, sit, prod), there will normally be a need to cater for differences in resource names (e.g. schema names, storage accounts, url’s ) across environments. The substitutions feature caters for this by allowing you to substitute values in your Data Flow Spec and SQL scripts, with values defined in a configuration file.

Substitutions can be configured in two ways:

  • tokenized

    Tokens can be included in your Data Flow Spec’s or SQL statements, indicated by curly braces and a substitution value can be assigned to them in the substitution config file. Note, tokens can be applied recursively.

  • prefix/suffix

    Prefixes and suffixes can be assigned to dataflow_spec attributes. This will automatically add the prefix or suffix to value of the attribute in every spot where that attribute is present in a Data Flow Spec even if it is nested.

There are a few reserved tokens that exist by default. Below is a list of the reserved tokens.

  • workspace_env: The target workspace environment, this is the one that appears in the databricks.yml file

Important

Ensure that commonly used substitutions are stored in the Global Framework configuration rather than individual Pipeline Bundles.

For example, maintain schema names in global substitution files.

Configuration

Scope: Global
In the Framework bundle, substitutions are defined in the following configuration file: src/config/<deployment environment/target>_substitutions.json|yaml
e.g. src/config/dev_substitutions.json|yaml
Scope: Pipeline
In a Pipeline bundle, substitutions are defined in the following configuration file: src/pipeline_configs/<deployment environment/target>_substitutions.json|yaml
e.g. src/pipeline_configs/dev_substitutions.json|yaml

Note

The <deployment environment/target> portion of the substitutions config file name must be the same as one of the environment targets listed in the databricks.yml file, as this determines which environment the bundle will be deployed to.

Precedence

The Global substitutions and Pipeline substitutions are merged, with Pipeline substitutions taking precedence.

Configuration Schema

The structure of the substitutions config file should be as below:

{
    "tokens": {
        "<token>": "<value>",
        ...
    },
    "prefix_suffix": {
        "<attrbute_name>": {
            "prefix | suffix": "<value>"
        },
        ...
    }
}

Field

Description

tokens

key-value pairs for tokenized substitutions.

prefix_suffix

Object that containing a additional objects defining the substitution behavior for the given attributes.

  • attribute_name: the Data Flow Spec attribute you wish to apply the prefix or suffix to.

  • prefix | suffix: the substitution mode.

  • value: the value to be added as a prefix or suffix. NOTE:
    • the value can be a token.

    • workspace_env is a reserved token that can be used to pass through the workspace environment (from the databricks.yml file).

Examples

Below is a sample output of substitutions applied for a given substitutions file and Data Flow Spec.

Substitutions config:

{
    "tokens": {
        "bronze_schema_x": "bronze_marketing",
        "bronze_schema_y": "bronze_collections"
    },
    "prefix_suffix": {
        "database": {
            "suffix": "{workspace_env}"
        }
    }
}

Data Flow Spec input:

{
    ...
        "flows": {
            "f_contract": {
                "flowType": "append_view",
                "flowDetails": {
                    "targetTable": "staging_table_apnd_3",
                    "sourceView": "v_brz_contract"
                },
                "views": {
                    "v_brz_contract": {
                        "mode": "stream",
                        "sourceType": "delta",
                        "sourceDetails": {
                            "database": "main.{bronze_schema_x}",
                            "table": "contract",
                            "cdfEnabled": true,
                            "selectExp": [
                                "*"
                            ],
                            "whereClause": []
                        }
                    }
                }
            },
            "f_loan": {
                "flowType": "append_view",
                "flowDetails": {
                    "targetTable": "staging_table_apnd_3",
                    "sourceView": "v_brz_loan"
                },
                "views": {
                    "v_brz_loan": {
                        "mode": "stream",
                        "sourceType": "delta",
                        "sourceDetails": {
                            "database": "main.{bronze_schema_y}",
                            "table": "loan",
                            "cdfEnabled": true,
                        }
                    }
                }
            },
            ...
        }
    ...
}

Data Flow Spec output:

{
    ...
        "flows": {
            "f_contract": {
                "flowType": "append_view",
                "flowDetails": {
                    "targetTable": "staging_table_apnd_3",
                    "sourceView": "v_brz_contract"
                },
                "views": {
                    "v_brz_contract": {
                        "mode": "stream",
                        "sourceType": "delta",
                        "sourceDetails": {
                            "database": "main.bronze_marketing_dev",
                            "table": "contract",
                            "cdfEnabled": true,
                            "selectExp": [
                                "*"
                            ],
                            "whereClause": []
                        }
                    }
                }
            },
            "f_loan": {
                "flowType": "append_view",
                "flowDetails": {
                    "targetTable": "staging_table_apnd_3",
                    "sourceView": "v_brz_loan"
                },
                "views": {
                    "v_brz_loan": {
                        "mode": "stream",
                        "sourceType": "delta",
                        "sourceDetails": {
                            "database": "main.bronze_collections_dev",
                            "table": "loan",
                            "cdfEnabled": true,
                        }
                    }
                }
            },
            ...
        }
    ...
}

You will notice the database fields all have the workspace environment suffix added to it.

The tokenized substitution takes place first then we can see there is a suffix of dev that is added to all fields that have the name database anywhere within the Data Flow Spec