Supported Source Types
======================

.. list-table::
   :header-rows: 0

   * - **Applies To:**
     - :bdg-success:`Pipeline Bundle`
   * - **Configuration Scope:**
     - :bdg-success:`Data Flow Spec`
   * - **Databricks Docs:**
     - https://docs.databricks.com/en/delta-live-tables/python-ref.html

The Lakeflow Framework supports multiple source types. Each source type provides specific configuration 
options to handle different data ingestion scenarios.

Source Types
------------

.. list-table::
   :header-rows: 1
   :widths: 20 100 100

   * - **Type**
     - **Description**
     - **Key Features**
   * - **Batch Files**
     - Reads data from UC Volumes orcloud storage locations (e.g., S3, ADLS, GCS). Supports various file formats and provides options for filtering and transforming data during ingestion.
     -  - Flexible path-based file access
        - Reader options for different file formats
        - Optional select expressions and where clauses
        - Schema on read support
   * - **Cloud Files**
     - Reads data from UC Volumes or cloud storage locations (e.g., S3, ADLS, GCS). Supports various file formats and provides options for filtering and transforming data during ingestion.
     -  - Flexible path-based file access
        - Reader options for different file formats
        - Optional select expressions and where clauses
        - Schema on read support
   * - **Delta**
     - Connects to existing Delta tables in the metastore, supporting both batch and streaming reads with change data feed (CDF) capabilities.
     -  - Database and table-based access
        - Change Data Feed (CDF) support
        - Optional path-based access
        - Configurable reader options
   * - **Delta Join**
     - Enables joining multiple Delta tables, supporting both streaming and static join patterns.
     -  - Multiple source table configuration
        - Stream and static join modes
        - Left and inner join support
        - Flexible join conditions
        - Per-source CDF configuration
   * - **Kafka**
     - Enables reading from Apache Kafka topics for real-time streaming data processing.
     -  - Kafka-specific reader options
        - Schema definition support
        - Filtering and transformation support
        - Topic-based configurations
        - Demux and fan-out support
   * - **Python**
     - Allows using a Python function as a data source, providing flexibility for complex data transformations.
     -  - Python file-based configuration
        - Functions stored in `python_functions` subdirectory
        - Full Python / Pyspark capabilities
        - Detailed configuration details: :doc:`feature_python_source`
   * - **SQL**
     - Allows using SQL queries as data sources, providing flexibility for complex data transformations.
     -  - SQL file-based configuration
        - Queries stored in `dml` subdirectory
        - Full SQL transformation capabilities
        - Detailed configuration details: :doc:`feature_sql_source`

General Data Flow Spec Configuration
------------------------------------

Set as an attribute when creating your Data Flow Spec, refer to the :doc:`dataflow_spec_reference` documentation for more information:

* :doc:`dataflow_spec_ref_source_details`
* :doc:`dataflow_spec_ref_target_details`

Detailed Source Type Configuration Details
------------------------------------------

.. toctree::
   :maxdepth: 1

   feature_python_source
   feature_sql_source