SQL Source
==========

.. list-table::
   :header-rows: 0

   * - **Applies To:**
     - :bdg-info:`Pipeline Bundle`
   * - **Configuration Scope:**
     - :bdg-info:`Data Flow Spec`
   * - **Databricks Docs:**
     - NA

Overview
--------
You can specify a SQL query as a source type in your Data Flow Specs. These allow for flexibility and more complex transformations to be 
supported, as needed, without overly complicating the Framework.

Sample Bundle
-------------

A sample is available in the ``gold_sample`` bundle in the ``src/dataflows/stream_static_samples`` folder and can be seen in the 
``dim_customer_sql_main.json`` file.

Configuration
-------------

**SQL Query Definition**

To define a SQL query, you need to create a ``dml`` folder under the base folder for your given Data Flow Spec. 
You can then create a ``.sql`` file for your query under this folder. 

For example:

  ::

      my_pipeline_bundle/
      ├── src/
      │   ├── dataflows
      │   │   ├── use_case_1
      │   │   │   ├── my_data_flow_spec_main.json
      │   │   │   ├── dml
      │   │   │   │   └── my_query.sql
      │   │   │   ├── expectations
      │   │   │   ├── python_functions
      │   │   │   └── schemas


Your file can contain any SQL supported by Databricks but must ultimately return a dataset as a Single query. 
You can use CTEs, subqueries, joins, etc.

**Substitution Variables**

You can use substitution variables in your SQL query by using the ``{var}`` syntax. 
These will be substituted per the :doc:`feature_substitutions documentation.

For example:

.. code-block:: sql

    SELECT * FROM {bronze_schema}.my_table

**Referencing the Python Source in a Data Flow Spec**

To reference the Python source in a Data Flow Spec, you need to specify a Python source type in your Data Flow Spec. 
Refer to the :doc:`dataflow_spec_ref_source_details` section of the :doc:`dataflow_spec_reference` documentation for more information.