Templates
Applies To: |
Pipeline Bundle |
Configuration Scope: |
Pipeline |
Overview
The Dataflow Spec Templates feature allows data engineers to create reusable templates for dataflow specifications. This significantly reduces code duplication when multiple dataflows share similar structures but differ only in specific parameters (e.g., table names, columns, etc.).
Important
Templates provide a powerful mechanism for standardizing dataflow patterns across your organization while maintaining flexibility for specific implementations.
This feature allows development teams to:
Reduce Code Duplication: Write once, reuse many times
Ensure Consistency: Similar dataflows follow the same structure
Improve Productivity: Quickly create multiple similar specifications
Reduce Errors: Less copy-paste reduces human error
Make Patterns Explicit: Templates make organizational patterns discoverable
Note
Template processing happens during the initialization phase of pipeline execution as the dataflow specs are loaded. Each processed spec is validated using the standard validation process.
How It Works
The template system consists of three main components:
Template Definitions: JSON files containing template definitions with placeholders
Template Dataflow Specifications: A dataflow specification that references a template and provides parameter sets
Template Processing: Framework logic that processes the template dataflow specifications and generates one dataflow spec per parameter set.
Anatomy of a Template Definition
A template definition is a JSON file that defines a reusable dataflow pattern. It consists of three main components:
{
"name": "standard_cdc_template",
"parameters": {
"dataFlowId": {
"type": "string",
"required": true
},
"sourceDatabase": {
"type": "string",
"required": true
},
"sourceTable": {
"type": "string",
"required": true
},
"targetTable": {
"type": "string",
"required": true
}
},
"template": {
"dataFlowId": "${param.dataFlowId}",
"sourceDetails": {
"database": "${param.sourceDatabase}",
"table": "${param.sourceTable}"
},
"targetDetails": {
"table": "${param.targetTable}"
}
}
}
name: standard_cdc_template
parameters:
dataFlowId:
type: string
required: true
sourceDatabase:
type: string
required: true
sourceTable:
type: string
required: true
targetTable:
type: string
required: true
template:
dataFlowId: ${param.dataFlowId}
sourceDetails:
database: ${param.sourceDatabase}
table: ${param.sourceTable}
targetDetails:
table: ${param.targetTable}
Key Components:
Component |
Description |
|---|---|
name |
The unique name for the template. make this the same as the filename. This is currently a placeholder for future functiuonality. |
parameters |
An object defining all parameters that can be used in the template. Each parameter has a |
template |
The dataflow specification template containing placeholders in the format |
Important
placeholders can be used in both keys, as full values or as part of or a full string value.
in JSON specs placeholders must always be wrapped in quotes:
"${param.name}"
File Location: - Template definitions: ``<dataflow_base_path>/templates/<name>.json`
Anatomy of a Template Dataflow Specification
A template dataflow specification is a simplified file that references a template and provides parameter sets for instantiation. Instead of writing full dataflow specs, data engineers create a template reference:
{
"template": "standard_cdc_template",
"parameterSets": [
{
"dataFlowId": "customer_scd2",
"sourceDatabase": "{bronze_schema}",
"sourceTable": "customer_raw",
"targetTable": "customer_scd2"
},
{
"dataFlowId": "customer_address_scd2",
"sourceDatabase": "{bronze_schema}",
"sourceTable": "customer_address_raw",
"targetTable": "customer_address_scd2"
}
]
}
template: standard_cdc_template
parameterSets:
- dataFlowId: customer_scd2
sourceDatabase: '{bronze_schema}'
sourceTable: customer_raw
targetTable: customer_scd2
- dataFlowId: customer_address_scd2
sourceDatabase: '{bronze_schema}'
sourceTable: customer_address_raw
targetTable: customer_address_scd2
Key Components:
Component |
Description |
|---|---|
template |
The filename of the template definition to use (without the |
parameterSets |
An array of parameter sets. Each object in the array represents one set of parameter values that will generate one complete dataflow specification. Each parameter set must include all required parameters defined in the template definition. |
Important
Each parameter set must include a unique
dataFlowIdvalueThe array must contain at least one parameter set
All required parameters from the template definition must be provided in each parameter set
File Location:
Template dataflow specifications follow the standard dataflow specification naming convention: <dataflow_base_path>/dataflows/<dataflow_name>/dataflowspec/*_main.json
Processing Result:
A template dataflow specification with N parameter sets will generate N complete dataflow specifications at runtime, each validated independently.
Template Processing
During the dataflow spec build process, the template processor will:
Detect spec files containing a
templatekeyLoads the referenced template file
For each parameter set in
parameterSets, create a concrete spec by replacing all${param.<key>}placeholdersValidate each expanded spec using the existing schema validators
Return the expanded specs with unique internal identifiers
Example Usage
Example: Basic File Source Ingestion Template
This example shows a template for basic file source ingestion, from a hypothetical source system called “erp_system”.
Template Definition (src/templates/bronze_erp_system_file_ingestion_template.json|yaml):
{
"name": "bronze_erp_system_file_ingestion_template",
"parameters": {
"dataFlowId": {
"type": "string",
"required": true
},
"sourceTable": {
"type": "string",
"required": true
},
"schemaPath": {
"type": "string",
"required": true
},
"targetTable": {
"type": "string",
"required": true
}
},
"template": {
"dataFlowId": "${param.dataFlowId}",
"dataFlowGroup": "bronze_erp_system",
"dataFlowType": "standard",
"sourceSystem": "erp_system",
"sourceType": "cloudFiles",
"sourceViewName": "v_${param.sourceTable}",
"sourceDetails": {
"path": "{landing_erp_file_location}/${param.sourceTable}/",
"readerOptions": {
"cloudFiles.format": "csv",
"header": "true"
},
"schemaPath": "${param.schemaPath}"
},
"mode": "stream",
"targetFormat": "delta",
"targetDetails": {
"table": "${param.targetTable}"
}
}
}
name: bronze_erp_system_file_ingestion_template
parameters:
dataFlowId:
type: string
required: true
sourceTable:
type: string
required: true
schemaPath:
type: string
required: true
targetTable:
type: string
required: true
template:
dataFlowId: ${param.dataFlowId}
dataFlowGroup: bronze_erp_system
dataFlowType: standard
sourceSystem: erp_system
sourceType: cloudFiles
sourceViewName: v_${param.sourceTable}
sourceDetails:
path: '{landing_erp_file_location}/${param.sourceTable}/'
readerOptions:
cloudFiles.format: csv
header: 'true'
schemaPath: ${param.schemaPath}
mode: stream
targetFormat: delta
targetDetails:
table: ${param.targetTable}
Template Dataflow Specification (src/dataflows/bronze_erp_system/dataflowspec/bronze_erp_system_file_ingestion_main.json|yaml):
{
"template": "bronze_erp_system_file_ingestion_template",
"parameterSets": [
{
"dataFlowId": "customer_file_source",
"sourceTable": "customer",
"schemaPath": "customer_schema.json",
"targetTable": "customer"
},
{
"dataFlowId": "customer_address_file_source",
"sourceTable": "customer_address",
"schemaPath": "customer_address_schema.json",
"targetTable": "customer_address"
},
{
"dataFlowId": "supplier_file_source",
"sourceTable": "supplier",
"schemaPath": "supplier_schema.json",
"targetTable": "supplier"
}
]
}
template: bronze_erp_system_file_ingestion_template
parameterSets:
- dataFlowId: customer_file_source
sourceTable: customer
schemaPath: customer_schema.json
targetTable: customer
- dataFlowId: customer_address_file_source
sourceTable: customer_address
schemaPath: customer_address_schema.json
targetTable: customer_address
- dataFlowId: supplier_file_source
sourceTable: supplier
schemaPath: supplier_schema.json
targetTable: supplier
Result: This template dataflow specification generates 3 concrete dataflow specs, one for each parameter set in the parameterSets array.
Parameter Types
Parameters support multiple data types and structures:
Type |
Template Usage |
Example |
|---|---|---|
Strings |
|
|
Numbers |
|
|
Booleans |
|
|
Arrays |
|
|
Objects |
|
|
Key Features
Python Function Path Search Priority
Enhanced fallback chain for path values.
The framework searches for python function path values in the following order:
In the pipeline bundle base path of the dataflow spec file
Under the templates directory of the pipeline bundle
Under the extensions directory of the pipeline bundle
Under the framework extensions directory
Error Handling
The framework provides clear error messages for common issues:
Missing template file: Lists all searched locations
Missing parameters: Warns about unreplaced placeholders
Invalid JSON: Shows parsing errors with context
Validation errors: Each expanded spec is validated individually
Validation
Each expanded spec is validated using the existing schema validators to ensure correctness.
Template usage specs are validated against the schema at src/schemas/spec_template.json:
template: Required string (template name without .json extension)params: Required array with at least one parameter objectEach parameter object must be a dictionary with at least one key-value pair
Unique Identifiers
Generated specs receive unique internal keys in the format path#template_0, path#template_1, etc., to ensure proper tracking and debugging.
Best Practices
Naming Conventions
Template Files: Use descriptive names ending with
_template(e.g.,standard_cdc_template.json)Parameter Names: Use clear, descriptive names (e.g.,
sourceTableinstead ofst)Consistency: Maintain consistent naming patterns across related templates
Development and Testing
1. Concrete First: Develop a concrete dataflow spec first, get it working and then turn it into a template defintion. 1. Validation: Always test processed specs by running the pipeline with a small subset of data 2. Version Control: Track templates in version control to maintain a history of changes 3. Iterative Development: Start with a simple template and enhance it as patterns emerge
Maintainability
Template Updates: When updating a template, test all usages to ensure compatibility
Parameter Validation: Document required parameters for each template
Backwards Compatibility: Consider versioning templates if making breaking changes
Limitations
The current template implementation has the following limitations, which may be addressed in future versions:
No Template Sub Components (Blocks): Templates cannot reference other templates or smaller template blocks
No Conditional Logic: Complex conditional logic is not supported (consider using multiple templates)
Note
For complex conditional logic requirements, create multiple templates that represent different scenarios rather than trying to implement logic within a single template.