Operational Metadata
Applies To: |
Framework Bundle |
Configuration Scope: |
Global |
Databricks Docs: |
NA |
Operational metadata is data that describes a given data load and pipeline execution, for example:
Ingestion date
Execution date
Update ID / Job ID
File name (for data ingested from files)
The Operational Metadata feature of the Framework allows for the addition of metadata columns to all target tables generated by the pipelines you have defined in the framework.
Best Practice
You can define and add as many metadata columns as you require. However, it is recommended that you group your data into logically organized
structormapcolumns to avoid cluttering the table schema.
Configuration
src/config/operational_metadata_<layer>.jsonLayer Config
The layer suffix of the file needs to match one of the layers as defined in your pipeline / Data Flow Spec configurations e.g. bronze, silver, gold.
The operational columns defined in the JSON configuration will be applied to all tables in the corresponding layer.
Configuration Schema
The operational metadata configuration file must follow the schema below:
{
"type": "struct",
"fields": [
{
"name": "<column_name>",
"type": {
"type": "data_type",
"nullable": true/false,
"metadata": {
"mapping": {
"type": "<mapping_type>",
"payload": "<mapping_payload>"
}
},
}
},
...
]
}
Field |
Description |
|---|---|
name |
The name of the operational metadata column. |
type |
The data_type of the column. For a list of valid data types please refer to the Databricks documentation: `https://docs.databricks.com/en/sql/language-manual/sql-ref-datatypes.html`_ |
nullable |
A boolean that indicates whether the field can be null or not. |
metadata |
A JSON object that contains the metadata for the column. |
metadata.mapping |
A JSON object that contains the mapping configuration for the column. |
metadata.mapping.type |
The type of mapping to be applied to the column. See below for supported types. |
metadata.mapping.payload |
The SQL string to be applied to the column. |
Mapping Types
Field |
Description |
Payload |
|---|---|---|
sql |
The column values will be derived by executing the SQL string provided in the payload. |
Any valid SQL function or expression, per the Databricks SQL Language Reference. |
pipeline_detail |
The name of any single |
The following attributes are available:
|
Note
For existing tables/pipelines, the operational metadata schema can be added or modified only if the table schema is not enforced and schema evolution is allowed (in this case, only new records will have the correct values in the added/modified fields). Otherwise the pipeline has to be fully refreshed to reprocess all data and apply operational metadata.
Best Practice Configuration
It is recommended that you group your data into logically organized struct or map columns to avoid cluttering the table schema, as decribed by the below configuration schema: The operational metadata configuration file must follow the schema below:
{
"type": "struct",
"fields": [
{
"name": "<column_name>",
"type": {
"type": "struct",
"fields": [
{
"name": "<sub_column_name>",
"type": {
"type": "data_type",
"nullable": true/false,
"metadata": {
"mapping": {
"type": "<mapping_type>",
"payload": "<mapping_payload>"
}
},
}
},
...
]
},
...
},
...
]
}
Examples
The below example illustrates the default configuration for a generic bronze and silver layer deployment.
{
"type": "struct",
"fields": [
{
"name": "meta_load_details",
"type": {
"type": "struct",
"fields": [
{
"name": "record_insert_timestamp",
"type": "timestamp",
"nullable": false,
"metadata": {
"mapping": {
"type": "sql",
"payload": "current_timestamp()"
}
}
},
{
"name": "pipeline_update_id",
"type": "string",
"nullable": false,
"metadata": {
"mapping": {
"type": "pipeline_detail",
"payload": "pipeline_update_id"
}
}
}
]
}
}
]
}