Operational Metadata

Applies To:	Framework Bundle
Configuration Scope:	Global
Databricks Docs:	NA

Operational metadata is data that describes a given data load and pipeline execution, for example:

Ingestion date
Execution date
Update ID / Job ID
File name (for data ingested from files)

The Operational Metadata feature of the Framework allows for the addition of metadata columns to all target tables generated by the pipelines you have defined in the framework.

Best Practice

You can define and add as many metadata columns as you require. However, it is recommended that you group your data into logically organized struct or map columns to avoid cluttering the table schema.

Configuration

Scope: Global

In the Framework bundle, operational metadata columns are defined in JSON configuration files at Lakehouse layer level (e.g. bronze, silver, gold). The configuration files are locate at and must be named as follows: src/config/operational_metadata_<layer>.json

Layer Config

The layer suffix of the file needs to match one of the layers as defined in your pipeline / Data Flow Spec configurations e.g. bronze, silver, gold.

The operational columns defined in the JSON configuration will be applied to all tables in the corresponding layer.

Configuration Schema

The operational metadata configuration file must follow the schema below:

{
    "type": "struct",
    "fields": [
        {
            "name": "<column_name>",
            "type": {
                "type": "data_type",
                "nullable": true/false,
                "metadata": {
                    "mapping": {
                        "type": "<mapping_type>",
                        "payload": "<mapping_payload>"
                    }
                },
            }
        },
        ...
    ]
}

Field	Description
name	The name of the operational metadata column.
type	The data_type of the column. For a list of valid data types please refer to the Databricks documentation: `https://docs.databricks.com/en/sql/language-manual/sql-ref-datatypes.html`_
nullable	A boolean that indicates whether the field can be null or not.
metadata	A JSON object that contains the metadata for the column.
metadata.mapping	A JSON object that contains the mapping configuration for the column.
metadata.mapping.type	The type of mapping to be applied to the column. See below for supported types.
metadata.mapping.payload	The SQL string to be applied to the column.

Mapping Types

Field

Description

Payload

sql

The column values will be derived by executing the SQL string provided in the payload.

Any valid SQL function or expression, per the Databricks SQL Language Reference.

pipeline_detail

The name of any single pipeline_detail attribute.

The following attributes are available:

pipeline_id: The pipeline id

pipeline_update_id: The pipeline update id

pipeline_layer: The medallion layer for the pipeline

start_utc_timestamp: The start timestamp of the pipeline in UTC

pipeline_catalog: The SDP target catalog

pipeline_schema: The SDP target schema/database

workspace_env: The workspace environment

logical_env: The logical environment

Note

For existing tables/pipelines, the operational metadata schema can be added or modified only if the table schema is not enforced and schema evolution is allowed (in this case, only new records will have the correct values in the added/modified fields). Otherwise the pipeline has to be fully refreshed to reprocess all data and apply operational metadata.

Best Practice Configuration

It is recommended that you group your data into logically organized struct or map columns to avoid cluttering the table schema, as decribed by the below configuration schema: The operational metadata configuration file must follow the schema below:

{
    "type": "struct",
    "fields": [
        {
            "name": "<column_name>",
            "type": {
                "type": "struct",
                "fields": [
                    {
                        "name": "<sub_column_name>",
                        "type": {
                            "type": "data_type",
                            "nullable": true/false,
                            "metadata": {
                                "mapping": {
                                    "type": "<mapping_type>",
                                    "payload": "<mapping_payload>"
                                }
                            },
                        }
                    },
                    ...
                ]
            },
            ...
        },
        ...
    ]
}

Examples

The below example illustrates the default configuration for a generic bronze and silver layer deployment.

{
    "type": "struct",
    "fields": [
        {
            "name": "meta_load_details",
            "type": {
                "type": "struct",
                "fields": [
                    {
                        "name": "record_insert_timestamp",
                        "type": "timestamp",
                        "nullable": false,
                        "metadata": {
                            "mapping": {
                                "type": "sql",
                                "payload": "current_timestamp()"
                            }
                        }
                    },
                    {
                        "name": "pipeline_update_id",
                        "type": "string",
                        "nullable": false,
                        "metadata": {
                            "mapping": {
                                "type": "pipeline_detail",
                                "payload": "pipeline_update_id"
                            }
                        }
                    }
                ]
            }
        }
    ]
}