The Samples
The Framework comes with extensive samples that demonstrate the use of the framework and Lakeflow concepts. At the time of writing, sample are organized into the following bundles:
Bronze
Silver
Gold
Test Data and Orchestrator
TPC-H
The samples broadly break down into the following:
Sample Type |
Folder |
Description |
|---|---|---|
Base and Pattern Samples |
|
Bronze, Silver and Gold samples that demonstrate the patterns and data examples used in the Data Flow and Pipeline Patterns section of the documentation |
Feature Samples |
|
Sample per key feature |
Kafka Samples |
|
Base Kafka, Confluent schema registry and SQL off Kafka samples |
TPC-H Sample |
Separate bundle for TPC-H samples |
Based on TPC-H schema in UC samples catalog, reverse engineered to demonstrate end to end streaming data warehouse |
Deploying the Samples
The samples can be deployed using the scripts located in the samples directory:
deploy.sh: Deploys all the samples execpt for TPC-H.
deploy_bronze.sh: Deploys only the bronze samples.
deploy_silver.sh: Deploys only the silver samples.
deploy_gold.sh: Deploys only the gold samples.
deploy_orchestrator.sh: Deploys only the test data and orchestrator bundle.
deploy_tpch.sh: Deploys only the TPC-H sample.
Prerequisites:
Databricks CLI installed and configured
Lakeflow framework already deployed to your workspace (see Deploy the Framework)
Interactive Deployment
Navigate to the samples directory in the root of the Framework repository:
cd samplesRun the desired deploy script:
./deploy.shFollow the prompts to deploy the samples.
Databricks username: Your Databricks username in the workspace you are deploying to e.g.
jane.doe@company.com.Databricks workspace: The full URL of the workspace you are deploying to e.g.
https://company.cloud.databricks.com.Databricks CLI profile: The Databricks CLI profile you want to use for the deployment. Default:
DEFAULT.Select Compute: Select between Classic/Enhaced or Serverless compute (0=Enhanced, 1=Serverless). Default:
1.UC Catalog: The Unity Catalog you want to use for the deployment. Default:
main.Schema Namespace: The first part of the name for the bronze, silver and gold schemas. Default:
lakeflow_samples.Logical environment: The logical environment you want to use for the deployment e.g.
_test.
Important
Always specify a logical environment when deploying the samples, this ensures you don’t anyone elses existing samples in the workspace, as long as the logical environment is unique.
Suggested naming:
Your initials, e.g Jane Doe would be
_jdA Story ID, e.g
123456would be_123456Your client name, e.g Company would be
_clientOthers: business unit, team name, project name, etc…
Once deployment is complete, you can find the deployed bundles under
/Users/<username>/.bundle/
Single Command line deployment:
Navigate to the samples directory in the root of the Framework repository:
cd samplesRun the desired deploy script with required parameters:
./deploy.sh -u <databricks_username> -h <workspace_host> [-p <profile>] [-c <compute>] [-l <logical_env>] [--catalog <catalog>] [--schema_namespace <schema_namespace>]Parameters:
-u, --user: Your Databricks username (required)-h, --host: Databricks workspace host URL (required)-p, --profile: Databricks CLI profile (optional). Default:DEFAULT.-c, --compute: The type of compute to use (0=Enhanced, 1=Serverless). Default:1.-l, --logical_env: Logical environment suffix for schema names (optional). Default:_test.--catalog: Unity Catalog name (optional). Default:main.--schema_namespace: Overide the first part of the name for the bronze, silver and gold schemas (optional). Default:lakeflow_samples.
For example:
./deploy.sh -u jane.doe@company.com -h https://company.cloud.databricks.com -l _jd -c 1
Once deployment is complete, you can find the deployed bundles under
/Users/<username>/.bundle/
Using the Samples
The Test Data and Orchestrator bundle includes:
Test data initialization and load simulation
Multiple job to simulate end to end runs of the samples
Jobs
After deployment you should find the following jobs in your workspace:
Lakeflow Framework Samples - Run 1 - Load and Schema Initialization
Lakeflow Framework Samples - Run 2 - Load
Lakeflow Framework Samples - Run 3 - Load
Lakeflow Framework Samples - Run 4 - Load
These will be prefixed with the target and your username and suffixed with the logical environment you provided when deploying the samples.
For example:
[dev jane_doe] Lakeflow Framework Samples - Run 1 - Load and Schema Initialization (_jd)
To execute the samples, simply execute the jobs in order to simulate the end to end run of the samples over the test data.
Pipelines
You can also of course execute individual pipelines as well, these also follow a similiar name convention with Lakeflow Samples in the name.
Destroying the Samples
To destroy the samples, you can use the destroy.sh script following the command specified below.
./destroy.sh -h <workspace_host> [-p <profile>] [-l <logical_env>]
Parameters:
-h, --host: Databricks workspace host URL (required)
-p, --profile: Databricks CLI profile (optional, defaults to DEFAULT)
-l, --logical_env: Logical environment suffix for schema names (optional)
TPC-H Sample
The TPC-H sample is based off the TPC-H schema in the UC catalog and reverse engineered to demonstrate end to end streaming data warehouse.
To deploy the TPC-H sample, you can use the deploy_tpch.sh script following the same methods specified above.
This sample is currently still being built with an initial cut targetted for Sept 2025.