Operational Metadata

Applies To:

Framework Bundle

Configuration Scope:

Global

Databricks Docs:

NA

Operational metadata is data that describes a given data load and pipeline execution, for example:

  • Ingestion date

  • Execution date

  • Update ID / Job ID

  • File name (for data ingested from files)

The Operational Metadata feature of the Framework allows for the addition of metadata columns to all target tables generated by the pipelines you have defined in the framework.

Best Practice

You can define and add as many metadata columns as you require. However, it is recommended that you group your data into logically organized struct or map columns to avoid cluttering the table schema.

Configuration

Scope: Global
In the Framework bundle, operational metadata columns are defined in JSON configuration files at Lakehouse layer level (e.g. bronze, silver, gold). The configuration files are locate at and must be named as follows: src/config/operational_metadata_<layer>.json

Layer Config

  • The layer suffix of the file needs to match one of the layers as defined in your pipeline / Data Flow Spec configurations e.g. bronze, silver, gold.

  • The operational columns defined in the JSON configuration will be applied to all tables in the corresponding layer.

Configuration Schema

The operational metadata configuration file must follow the schema below:

{
    "type": "struct",
    "fields": [
        {
            "name": "<column_name>",
            "type": {
                "type": "data_type",
                "nullable": true/false,
                "metadata": {
                    "mapping": {
                        "type": "<mapping_type>",
                        "payload": "<mapping_payload>"
                    }
                },
            }
        },
        ...
    ]
}

Field

Description

name

The name of the operational metadata column.

type

The data_type of the column. For a list of valid data types please refer to the Databricks documentation: `https://docs.databricks.com/en/sql/language-manual/sql-ref-datatypes.html`_

nullable

A boolean that indicates whether the field can be null or not.

metadata

A JSON object that contains the metadata for the column.

metadata.mapping

A JSON object that contains the mapping configuration for the column.

metadata.mapping.type

The type of mapping to be applied to the column. See below for supported types.

metadata.mapping.payload

The SQL string to be applied to the column.

Mapping Types

Field

Description

Payload

sql

The column values will be derived by executing the SQL string provided in the payload.

Any valid SQL function or expression, per the Databricks SQL Language Reference.

pipeline_detail

The name of any single pipeline_detail attribute.

The following attributes are available:

  • pipeline_id: The pipeline id

  • pipeline_update_id: The pipeline update id

  • pipeline_layer: The medallion layer for the pipeline

  • start_utc_timestamp: The start timestamp of the pipeline in UTC

  • pipeline_catalog: The SDP target catalog

  • pipeline_schema: The SDP target schema/database

  • workspace_env: The workspace environment

  • logical_env: The logical environment

Note

For existing tables/pipelines, the operational metadata schema can be added or modified only if the table schema is not enforced and schema evolution is allowed (in this case, only new records will have the correct values in the added/modified fields). Otherwise the pipeline has to be fully refreshed to reprocess all data and apply operational metadata.

Best Practice Configuration

It is recommended that you group your data into logically organized struct or map columns to avoid cluttering the table schema, as decribed by the below configuration schema: The operational metadata configuration file must follow the schema below:

{
    "type": "struct",
    "fields": [
        {
            "name": "<column_name>",
            "type": {
                "type": "struct",
                "fields": [
                    {
                        "name": "<sub_column_name>",
                        "type": {
                            "type": "data_type",
                            "nullable": true/false,
                            "metadata": {
                                "mapping": {
                                    "type": "<mapping_type>",
                                    "payload": "<mapping_payload>"
                                }
                            },
                        }
                    },
                    ...
                ]
            },
            ...
        },
        ...
    ]
}

Examples

The below example illustrates the default configuration for a generic bronze and silver layer deployment.

{
    "type": "struct",
    "fields": [
        {
            "name": "meta_load_details",
            "type": {
                "type": "struct",
                "fields": [
                    {
                        "name": "record_insert_timestamp",
                        "type": "timestamp",
                        "nullable": false,
                        "metadata": {
                            "mapping": {
                                "type": "sql",
                                "payload": "current_timestamp()"
                            }
                        }
                    },
                    {
                        "name": "pipeline_update_id",
                        "type": "string",
                        "nullable": false,
                        "metadata": {
                            "mapping": {
                                "type": "pipeline_detail",
                                "payload": "pipeline_update_id"
                            }
                        }
                    }
                ]
            }
        }
    ]
}