Data Flow Specification Format
Applies To: |
Pipeline Bundle |
Configuration Scope: |
Framework Pipeline |
Overview
The Framework supports both JSON and YAML formats for defining pipeline specifications, providing flexibility in how you author and maintain your data flow specs, substitution files, secrets files, and other configuration files.
Important
The specification format applies to all configuration files in a Pipeline Bundle, including:
Data flow specifications (main specs and flow groups)
Data quality expectations
Substitution files
Secrets files
This feature allows development teams to choose the format that best suits their workflow and preferences, while maintaining full compatibility with the Framework’s validation and execution capabilities.
Note
Both formats are functionally equivalent and fully interchangeable. The choice between JSON and YAML is purely a matter of preference and workflow requirements.
Configuration
The specification format can be configured at two levels:
Framework Level: Global configuration that applies to all Pipeline Bundles
Pipeline Level: Pipeline-specific configuration that can override the global setting (if allowed)
Framework-Level Configuration
src/config/global.json|yaml{
"pipeline_bundle_spec_format": {
"format": "json",
"allow_override": false
}
}
pipeline_bundle_spec_format:
format: json
allow_override: false
Field |
Description |
Valid Values |
Default |
|---|---|---|---|
format |
The default specification format for all Pipeline Bundles |
|
|
allow_override |
Whether individual Pipeline Bundles can override the global format setting |
|
|
Pipeline-Level Configuration
src/pipeline_configs/global.json|yaml{
"pipeline_bundle_spec_format": {
"format": "yaml"
}
}
pipeline_bundle_spec_format:
format: yaml
Important
Pipeline-level overrides are only permitted if allow_override is set to true in the Framework’s global configuration. If allow_override is false, attempting to override the format will result in a validation error.
Supported File Types and Naming Conventions
The Framework automatically detects the specification format based on file naming conventions:
JSON Format
File Type |
File Suffix |
|---|---|
Main Specifications |
|
Flow Group Specifications |
|
Data Quality Expectations |
|
Secrets Files |
|
Substitution Files |
|
YAML Format
File Type |
File Suffix |
|---|---|
Main Specifications |
|
Flow Group Specifications |
|
Data Quality Expectations |
|
Secrets Files |
|
Substitution Files |
|
Note
The Framework supports both .yaml and .yml extensions for YAML files. Use whichever convention your team prefers, but be consistent within a Pipeline Bundle.
Example Specification
The following example shows a data flow specification in both JSON and YAML formats:
{
"dataFlowId": "customer_main",
"dataFlowGroup": "customers",
"dataFlowType": "standard",
"sourceSystem": "sourceA",
"sourceType": "autoloader",
"sourceFormat": "json",
"sourceDetails": {
"path": "${base_data_dir}/customer_data",
"readerOptions": {
"cloudFiles.format": "json",
"cloudFiles.inferColumnTypes": "true"
}
},
"mode": "stream",
"targetFormat": "delta",
"targetDetails": {
"table": "customer",
"tableProperties": {
"delta.enableChangeDataFeed": "true"
}
},
"dataQualityExpectationsEnabled": true,
"quarantineMode": "on"
}
dataFlowId: customer_main
dataFlowGroup: customers
dataFlowType: standard
sourceSystem: sourceA
sourceType: autoloader
sourceFormat: json
sourceDetails:
path: ${base_data_dir}/customer_data
readerOptions:
cloudFiles.format: json
cloudFiles.inferColumnTypes: 'true'
mode: stream
targetFormat: delta
targetDetails:
table: customer
tableProperties:
delta.enableChangeDataFeed: 'true'
dataQualityExpectationsEnabled: true
quarantineMode: on
Best Practices
Choose One Format Globally: While technically possible to mix formats across Bundles, it’s recommended to standardise on a single format.
Version Control Considerations: YAML may produce cleaner diffs in version control systems due to its more human-readable format and lack of trailing commas.
Validation: Always validate specifications after conversion or manual edits using the Framework’s built-in validation capabilities.
Schema Files: Schema files (
*_schema.json) remain in JSON or DDL format regardless of the specification format setting, as JSON is the format for schema definitions.
Configuration Examples
Example 1: Framework Enforces JSON Format
Framework Configuration (src/config/global.json|yaml):
{
"pipeline_bundle_spec_format": {
"format": "json",
"allow_override": false
}
}
pipeline_bundle_spec_format:
format: json
allow_override: false
Result: All Pipeline Bundles must use JSON format. Pipeline-level overrides will be rejected.
Example 2: Framework Allows Format Flexibility
Framework Configuration (src/config/global.json|yaml):
{
"pipeline_bundle_spec_format": {
"format": "json",
"allow_override": true
}
}
pipeline_bundle_spec_format:
format: json
allow_override: true
Pipeline Configuration (src/pipeline_configs/global.json|yaml):
{
"pipeline_bundle_spec_format": {
"format": "yaml"
}
}
pipeline_bundle_spec_format:
format: yaml
Result: This specific Pipeline Bundle will use YAML format, while other bundles will default to JSON unless explicitly overridden.
Example 3: Framework Defaults to YAML
Framework Configuration (src/config/global.json|yaml):
{
"pipeline_bundle_spec_format": {
"format": "yaml",
"allow_override": false
}
}
pipeline_bundle_spec_format:
format: yaml
allow_override: false
Result: All Pipeline Bundles must use YAML format. This is useful when migrating an entire organization to YAML.
Troubleshooting
Format Mismatch Errors
Problem: Framework reports that files cannot be found or loaded.
Solution:
- Verify the format setting in both Framework and Pipeline configurations
- Ensure file suffixes match the configured format (e.g., *_main.yaml for YAML)
- Check that all files in the bundle use consistent naming conventions
Override Not Permitted
Problem: Error message: “Pipeline bundle spec format has been set at global framework level. Override has been disabled.”
Solution:
- This occurs when attempting to override the format at Pipeline level when allow_override is false
- Either remove the Pipeline-level configuration or request that allow_override be enabled in the Framework configuration
Invalid Format Value
Problem: Error message: “Invalid pipeline bundle spec format: <value>”
Solution:
- Ensure the format field is set to either "json" or "yaml"
- Check for typos in the configuration file
- Validate the JSON syntax of the configuration file
Validation Errors After Conversion
Problem: YAML files fail validation after conversion from JSON.
Solution:
- Validate the YAML syntax and structure
- Check for data type issues (e.g., boolean values should be true/false, not strings)
- Ensure quotes are preserved around string values that look like other types (e.g., "true" vs true)
- Review the specification for any structural issues
Mixed Format Detection
Problem: Bundle contains both JSON and YAML files with the same base name.
Solution: - The Framework will load files based on the configured format - Remove files that don’t match the configured format to avoid confusion - Ensure consistent naming conventions throughout the bundle
See Also
Substitutions - Using substitutions in specifications
Secrets Management - Managing secrets in specifications
Validation - Specification validation