The Samples ########### The Framework comes with extensive samples that demonstrate the use of the framework and Lakeflow concepts. At the time of writing, sample are organized into the following bundles: * Bronze * Silver * Gold * Test Data and Orchestrator * TPC-H The samples broadly break down into the following: .. list-table:: :header-rows: 1 * - Sample Type - Folder - Description * - **Base and Pattern Samples** - - ``/src/dataflows/base_samples`` - ``/src/dataflows/`` - Bronze, Silver and Gold samples that demonstrate the patterns and data examples used in the :doc:`patterns` section of the documentation * - **Feature Samples** - ``/src/dataflows/feature_samples`` - Sample per key feature * - **Kafka Samples** - ``/src/dataflows/kafka_samples`` - Base Kafka, Confluent schema registry and SQL off Kafka samples * - **TPC-H Sample** - Separate bundle for TPC-H samples - Based on TPC-H schema in UC samples catalog, reverse engineered to demonstrate end to end streaming data warehouse .. _local_sample_deployment: Deploying the Samples --------------------- The samples can be deployed using the scripts located in the ``samples`` directory: * ``deploy.sh``: Deploys all the samples execpt for TPC-H. * ``deploy_bronze.sh``: Deploys only the bronze samples. * ``deploy_silver.sh``: Deploys only the silver samples. * ``deploy_gold.sh``: Deploys only the gold samples. * ``deploy_orchestrator.sh``: Deploys only the test data and orchestrator bundle. * ``deploy_tpch.sh``: Deploys only the TPC-H sample. Prerequisites: * Databricks CLI installed and configured * Lakeflow framework already deployed to your workspace (see :doc:`deploy_framework`) Interactive Deployment ^^^^^^^^^^^^^^^^^^^^^^^ 1. Navigate to the samples directory in the root of the Framework repository: .. code-block:: console cd samples 2. Run the desired deploy script: .. code-block:: console ./deploy.sh 3. Follow the prompts to deploy the samples. * **Databricks username**: Your Databricks username in the workspace you are deploying to e.g. ``jane.doe@company.com``. * **Databricks workspace**: The full URL of the workspace you are deploying to e.g. ``https://company.cloud.databricks.com``. * **Databricks CLI profile**: The Databricks CLI profile you want to use for the deployment. Default: ``DEFAULT``. * **Select Compute**: Select between Classic/Enhaced or Serverless compute (0=Enhanced, 1=Serverless). Default: ``1``. * **UC Catalog**: The Unity Catalog you want to use for the deployment. Default: ``main``. * **Schema Namespace**: The first part of the name for the bronze, silver and gold schemas. Default: ``lakeflow_samples``. * **Logical environment**: The logical environment you want to use for the deployment e.g. ``_test``. .. important:: Always specify a logical environment when deploying the samples, this ensures you don't anyone elses existing samples in the workspace, as long as the logical environment is unique. Suggested naming: * Your initials, e.g Jane Doe would be ``_jd`` * A Story ID, e.g ``123456`` would be ``_123456`` * Your client name, e.g Company would be ``_client`` * Others: business unit, team name, project name, etc... 4. Once deployment is complete, you can find the deployed bundles under ``/Users//.bundle/`` Single Command line deployment: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1. Navigate to the samples directory in the root of the Framework repository: .. code-block:: console cd samples 2. Run the desired deploy script with required parameters: .. code-block:: console ./deploy.sh -u -h [-p ] [-c ] [-l ] [--catalog ] [--schema_namespace ] Parameters: * ``-u, --user``: Your Databricks username (required) * ``-h, --host``: Databricks workspace host URL (required) * ``-p, --profile``: Databricks CLI profile (optional). Default: ``DEFAULT``. * ``-c, --compute``: The type of compute to use (0=Enhanced, 1=Serverless). Default: ``1``. * ``-l, --logical_env``: Logical environment suffix for schema names (optional). Default: ``_test``. * ``--catalog``: Unity Catalog name (optional). Default: ``main``. * ``--schema_namespace``: Overide the first part of the name for the bronze, silver and gold schemas (optional). Default: ``lakeflow_samples``. For example: .. code-block:: console ./deploy.sh -u jane.doe@company.com -h https://company.cloud.databricks.com -l _jd -c 1 4. Once deployment is complete, you can find the deployed bundles under ``/Users//.bundle/`` Using the Samples ---------------- Test Data and Orchestrator ~~~~~~~~~~~~~~~~~~~~~~~~~~ The Test Data and Orchestrator bundle includes: * Test data initialization and load simulation * Multiple job to simulate end to end runs of the samples **Jobs** After deployment you should find the following jobs in your workspace: * Lakeflow Framework Samples - Run 1 - Load and Schema Initialization * Lakeflow Framework Samples - Run 2 - Load * Lakeflow Framework Samples - Run 3 - Load * Lakeflow Framework Samples - Run 4 - Load These will be prefixed with the target and your username and suffixed with the logical environment you provided when deploying the samples. For example: ``[dev jane_doe] Lakeflow Framework Samples - Run 1 - Load and Schema Initialization (_jd)`` To execute the samples, simply execute the jobs in order to simulate the end to end run of the samples over the test data. **Pipelines** You can also of course execute individual pipelines as well, these also follow a similiar name convention with ``Lakeflow Samples`` in the name. Destroying the Samples ---------------------- To destroy the samples, you can use the ``destroy.sh`` script following the command specified below. .. code-block:: console ./destroy.sh -h [-p ] [-l ] Parameters: * ``-h, --host``: Databricks workspace host URL (required) * ``-p, --profile``: Databricks CLI profile (optional, defaults to DEFAULT) * ``-l, --logical_env``: Logical environment suffix for schema names (optional) TPC-H Sample ------------ The TPC-H sample is based off the TPC-H schema in the UC catalog and reverse engineered to demonstrate end to end streaming data warehouse. To deploy the TPC-H sample, you can use the ``deploy_tpch.sh`` script following the same methods specified above. This sample is currently still being built with an initial cut targetted for Sept 2025.