Operational Metadata
====================

.. list-table::
   :header-rows: 0

   * - **Applies To:**
     - :bdg-info:`Framework Bundle`
   * - **Configuration Scope:**
     - :bdg-info:`Global`
   * - **Databricks Docs:**
     - NA

Operational metadata is data that describes a given data load and pipeline execution, for example:

* Ingestion date
* Execution date
* Update ID / Job ID
* File name (for data ingested from files)

The Operational Metadata feature of the Framework allows for the addition of metadata columns to all target tables generated by the pipelines you have defined in the framework.

.. admonition:: Best Practice
   :class: note
    
    You can define and add as many metadata columns as you require. However, it is recommended that you group your data into logically organized ``struct`` or ``map`` columns to avoid cluttering the table schema.

Configuration
-------------

| **Scope: Global**
| In the Framework bundle, operational metadata columns are defined in JSON configuration files at Lakehouse layer level (e.g. bronze, silver, gold). The configuration files are locate at and must be named as follows: ``src/config/operational_metadata_<layer>.json`` 

.. admonition:: Layer Config
   :class: note
    
    * The layer suffix of the file needs to match one of the layers as defined in your pipeline / Data Flow Spec configurations e.g. bronze, silver, gold.
    * The operational columns defined in the JSON configuration will be applied to all tables in the corresponding layer.

Configuration Schema
--------------------

The operational metadata configuration file must follow the schema below:

.. code-block:: json

    {
        "type": "struct",
        "fields": [
            {
                "name": "<column_name>",
                "type": {
                    "type": "data_type",
                    "nullable": true/false,
                    "metadata": {
                        "mapping": {
                            "type": "<mapping_type>",
                            "payload": "<mapping_payload>"
                        }
                    },
                }
            },
            ...
        ]
    }

.. list-table::
   :header-rows: 1

   * - Field
     - Description
   * - **name**
     - The name of the operational metadata column.
   * - **type**
     - The data_type of the column. For a list of valid data types please refer to the Databricks documentation: `https://docs.databricks.com/en/sql/language-manual/sql-ref-datatypes.html`_
   * - **nullable**
     - A boolean that indicates whether the field can be null or not.
   * - **metadata**
     - A JSON object that contains the metadata for the column.
   * - **metadata.mapping**
     - A JSON object that contains the mapping configuration for the column.
   * - **metadata.mapping.type**
     - The type of mapping to be applied to the column. See below for supported types.
   * - **metadata.mapping.payload**
     - The SQL string to be applied to the column.

**Mapping Types**

.. list-table::
   :header-rows: 1

   * - Field
     - Description
     - Payload
   * - **sql**
     - The column values will be derived by executing the SQL string provided in the payload.
     - Any valid SQL function or expression, per the Databricks `SQL Language Reference <https://docs.databricks.com/en/sql/language-manual/index.html>`_.
   * - **pipeline_detail**
     - The name of any single ``pipeline_detail`` attribute.
     - The following attributes are available:

        * **pipeline_id**: The pipeline id
        * **pipeline_update_id**: The pipeline update id
        * **pipeline_layer**: The medallion layer for the pipeline
        * **start_utc_timestamp**: The start timestamp of the pipeline in UTC
        * **pipeline_catalog**: The SDP target catalog
        * **pipeline_schema**: The SDP target schema/database
        * **workspace_env**: The workspace environment
        * **logical_env**: The logical environment

.. Note::
    For existing tables/pipelines, the operational metadata schema can be added or modified only if the table schema is not enforced and schema evolution is allowed (in this case, only new records will have the correct values in the added/modified fields). Otherwise the pipeline has to be fully refreshed to reprocess all data and apply operational metadata.

Best Practice Configuration
---------------------------

It is recommended that you group your data into logically organized struct or map columns to avoid cluttering the table schema, as decribed by the below configuration schema:
The operational metadata configuration file must follow the schema below:

.. code-block:: json

    {
        "type": "struct",
        "fields": [
            {
                "name": "<column_name>",
                "type": {
                    "type": "struct",
                    "fields": [
                        {
                            "name": "<sub_column_name>",
                            "type": {
                                "type": "data_type",
                                "nullable": true/false,
                                "metadata": {
                                    "mapping": {
                                        "type": "<mapping_type>",
                                        "payload": "<mapping_payload>"
                                    }
                                },
                            }
                        },
                        ...
                    ]
                },
                ...
            },
            ...
        ]
    }

Examples
--------

The below example illustrates the default configuration for a generic bronze and silver layer deployment. 

.. code-block:: json

    {
        "type": "struct",
        "fields": [
            {
                "name": "meta_load_details",
                "type": {
                    "type": "struct",
                    "fields": [
                        {
                            "name": "record_insert_timestamp",
                            "type": "timestamp",
                            "nullable": false,
                            "metadata": {
                                "mapping": {
                                    "type": "sql",
                                    "payload": "current_timestamp()"
                                }
                            }
                        },
                        {
                            "name": "pipeline_update_id",
                            "type": "string",
                            "nullable": false,
                            "metadata": {
                                "mapping": {
                                    "type": "pipeline_detail",
                                    "payload": "pipeline_update_id"
                                }
                            }
                        }
                    ]
                }
            }
        ]
    }