Stream-Static - Streaming Data Warehouse ========================================= Description ------------ Suitable for when you have a streaming table that you need to join to one or many additional static tables in order to derive your desired target data set, but you also want updates to the static tables to be reflected as they occur. Use when: - You want to join multiple streaming tables. - You want changes in any/all tables to be updated as they occur. - You only need to perform basic single row transforms. **Layers:** - Silver - Gold (no complex transforms or aggregations) **Models:** - 3NF such as ODS, Inmon and Enterprise Models - Data Vault - Dimensional: dimensions and basic transactional facts **Data Flow Components:** .. image:: images/stream_static_dwh.png :target: _images/stream_static_dwh.png :alt: Stream Static - Streaming DWH .. list-table:: :header-rows: 1 :widths: 5 15 60 20 * - No. - Component - Description - M / O * - 1 - Input Views - Input views are created over each streaming source table (as many as required). These views need only return: - the columns required for the necessary joins - a Sequence By column if this is an SCD 1/2 use case These views can optionally read from CDF if the source tables are CDF enabled. - M * - 2 - Append Flows - Append flows load only the PK's and Sequnce By columns into a staging table. - M * - 3 - Staging Append Only Table - A streaming append only table, the schema of which consists of only the primary keys and sequence by columns returned by each input view. - M * - 4 - Change Flow - A single change flow loads the data into the staging merge table. It essentially merges and dedupes all the rows. - M * - 5 - Staging Merge Table - A streaming table, the schema of which consists of only the primary keys and sequence by columns. CDF is enabled on this table. - M * - 6 - Stream-static Join View - A view that implements the frameworks delta-join source type. It uses the previous staging table as the driving streaming table, reading from its CDF feed, and performs static joins to ALL the tables defined in the join. - M * - 7 - Final Transform View - A view that applies a SQL transform (SELECT or CTE) to the data returned by the Stream-static Join View. This is optional and not required if no transformation needs to be applied. If you don't have a transform requirement you can omit the transform view. You may for example only need to specify which columns you want or perform a basic column renaming, which you can do in the Stream-static Join View (component 6). - O * - 8 - Append or Change Flow - An Append Flow (for transactional or fact based target tables) or an SCD1/2 Change Flow that loads the data into the final target table. - M * - 9 - Target Table - A streaming table, the schema of which is specified in the dataflowspec. This table is the final target table for the given flow. - M Feature Support ---------------- .. list-table:: :header-rows: 1 :widths: 50 50 * - Supported - Not Supported * - - Append Only & SCD 1/2 - Basic transforms such as: - Data type conversion - Concatenation - Single row calculations - Formatting - Cleansing & Data Quality Rules - Conditionals and calculations (single row) across multiple source tables - Joins - - Complex transforms such as aggregations - Window By Considerations and Limitations ------------------------------- - More complex to implement than the Stream-Static Basic pattern but allows for true streaming joins. Sample ------ - Bundle: ``dlt_framework/src/samples/silver_sample`` - Sample: ``dlt_framework/src/samples/silver_sample/src/dataflows/stream_static_p7`` Example Data Flow ------------------ Day 1 Load ~~~~~~~~~~ - **Source Tables (Append-Only)** CUSTOMER .. list-table:: :header-rows: 1 :widths: 15 15 15 25 30 * - customer_id - first_name - last_name - email - load_timestamp * - 1 - John - Doe - john.doe@example.com - 2023-01-01 10:00 * - 2 - Jane - Smith - jane.smith@example.com - 2023-01-01 10:00 CUSTOMER_ADDRESS .. list-table:: :header-rows: 1 :widths: 15 15 15 30 * - customer_id - city - state - load_timestamp * - 1 - Melbourne - VIC - 2023-01-01 10:00 * - 2 - Melbourne - VIC - 2023-01-01 10:00 * - 4 - Hobart - TAS - 2023-01-01 10:00 - **Staging Table (stg_source_1_appnd)** .. list-table:: :header-rows: 1 :widths: 30 70 * - customer_id - load_timestamp * - 1 - 2023-01-01 10:00 * - 2 - 2023-01-01 10:00 * - 1 - 2023-01-01 10:00 * - 2 - 2023-01-01 10:00 * - 4 - 2023-01-01 10:00 - **Target Table** - Append-Only Scenario .. list-table:: :header-rows: 1 :widths: 12 12 12 12 20 12 10 10 * - customer_id - first_name - last_name - full_name - email - city - state - load_timestamp * - 1 - John - Doe - John Doe - john.doe@example.com - Melbourne - VIC - 2023-01-01 10:00 * - 2 - Jane - Smith - Jane Smith - jane.smith@example.com - Melbourne - VIC - 2023-01-01 10:00 - SCD1 Scenario .. list-table:: :header-rows: 1 :widths: 12 12 12 12 20 12 10 * - customer_id - first_name - last_name - full_name - email - city - state * - 1 - John - Doe - John Doe - john.doe@example.com - Melbourne - VIC * - 2 - Jane - Smith - Jane Smith - jane.smith@example.com - Melbourne - VIC - SCD2 Scenario .. list-table:: :header-rows: 1 :widths: 12 12 12 12 20 12 10 15 15 * - customer_id - first_name - last_name - full_name - email - city - state - _START_AT - _END_AT * - 1 - John - Doe - John Doe - john.doe@example.com - Melbourne - VIC - 2023-01-01 10:00 - NULL * - 2 - Jane - Smith - Jane Smith - jane.smith@example.com - Melbourne - VIC - 2023-01-01 10:00 - NULL Day 2 Load ~~~~~~~~~~~ - **Source Tables (Append-Only)** CUSTOMER .. raw:: html
customer_id first_name last_name email load_timestamp
1 John Doe john.doe@example.com 2023-01-01 10:00
2 Jane Smith jane.smith@example.com 2023-01-01 10:00
1 John Doe jdoe@example.com 2023-01-02 10:00
3 Alice Green alice.green@example.com 2023-01-02 10:00
4 Joe Bloggs joe.bloggs@example.com 2023-01-02 10:00
CUSTOMER_ADDRESS .. raw:: html
customer_id city state load_timestamp
1 Melbourne VIC 2023-01-01 10:00
2 Melbourne VIC 2023-01-01 10:00
4 Hobart TAS 2023-01-01 10:00
2 Perth WA 2023-01-02 10:00
3 Sydney NSW 2023-01-02 10:00
- **Staging Table (stg_source_1_appnd)** .. raw:: html
customer_id load_timestamp
1 2023-01-01 10:00
2 2023-01-01 10:00
1 2023-01-01 10:00
2 2023-01-01 10:00
4 2023-01-01 10:00
1 2023-01-02 10:00
3 2023-01-02 10:00
2 2023-01-02 10:00
3 2023-01-02 10:00
4 2023-01-02 10:00
- **Staging Table (stg_source_1_mrg)** .. raw:: html
customer_id _START_AT _END_AT
1 2023-01-02 10:00 NULL
1 2023-01-01 10:00 2023-01-02 10:00
2 2023-01-02 10:00 NULL
2 2023-01-01 10:00 2023-01-02 10:00
3 2023-01-01 10:00 NULL
4 2023-01-02 10:00 NULL
4 2023-01-01 10:00 2023-01-02 10:00
- **Target Table** - Append-Only Scenario .. raw:: html
customer_id first_name last_name full_name email city state load_timestamp
1 John Doe John Doe john.doe@example.com Melbourne VIC 2023-01-01 10:00
2 Jane Smith Jane Smith jane.smith@example.com Melbourne VIC 2023-01-01 10:00
1 John Doe John Doe jdoe@example.com Melbourne VIC 2023-01-02 10:00
2 Jane Smith Jane Smith jane.smith@example.com Perth WA 2023-01-02 10:00
3 Alice Green alice.green@example.com alice.green@example.com Sydney NSW 2023-01-02 10:00
4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS 2023-01-02 10:00
- SCD1 Scenario .. raw:: html
customer_id first_name last_name full_name email city state
1 John Doe John Doe jdoe@example.com Melbourne VIC
2 Jane Smith Jane Smith jane.smith@example.com Perth WA
3 Alice Green alice.green@example.com alice.green@example.com Sydney NSW
4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS
- SCD2 Scenario .. raw:: html
customer_id first_name last_name full_name email city state _START_AT _END_AT
1 John Doe John Doe jdoe@example.com Melbourne VIC 2023-01-02 10:00 NULL
1 John Doe John Doe john.doe@example.com Melbourne VIC 2023-01-01 10:00 2023-01-02 10:00
2 Jane Smith Jane Smith jane.smith@example.com Perth WA 2023-01-02 10:00 NULL
2 Jane Smith Jane Smith jane.smith@example.com Melbourne VIC 2023-01-01 10:00 2023-01-02 10:00
3 Alice Green Alice Green alice.green@example.com Sydney NSW 2023-01-01 10:00 NULL
4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS 2023-01-02 10:00 NULL
Day 3 Load ~~~~~~~~~~~ - **Source Tables (Append-Only)** CUSTOMER .. raw:: html
customer_id first_name last_name email load_timestamp
1 John Doe john.doe@example.com 2023-01-01 10:00
2 Jane Smith jane.smith@example.com 2023-01-01 10:00
1 John Doe jdoe@example.com 2023-01-02 10:00
3 Alice Green alice.green@example.com 2023-01-02 10:00
4 Joe Bloggs joe.bloggs@example.com 2023-01-02 10:00
CUSTOMER_ADDRESS .. raw:: html
customer_id city state load_timestamp
1 Melbourne VIC 2023-01-01 10:00
2 Melbourne VIC 2023-01-01 10:00
4 Hobart TAS 2023-01-01 10:00
2 Perth WA 2023-01-02 10:00
3 Sydney NSW 2023-01-02 10:00
1 Brisbane QLD 2023-01-03 10:00
- **Staging Table (stg_source_1_appnd)** .. raw:: html
customer_id load_timestamp
1 2023-01-01 10:00
2 2023-01-01 10:00
1 2023-01-01 10:00
2 2023-01-01 10:00
4 2023-01-01 10:00
1 2023-01-02 10:00
3 2023-01-02 10:00
2 2023-01-02 10:00
3 2023-01-02 10:00
1 2023-01-03 10:00
- **Staging Table (stg_source_1_mrg)** .. raw:: html
customer_id _START_AT _END_AT
1 2023-01-03 10:00 NULL
1 2023-01-02 10:00 2023-01-03 10:00
1 2023-01-01 10:00 2023-01-02 10:00
2 2023-01-02 10:00 NULL
2 2023-01-01 10:00 2023-01-02 10:00
3 2023-01-01 10:00 NULL
4 2023-01-02 10:00 NULL
4 2023-01-01 10:00 2023-01-02 10:00
- **Target Table** - Append-Only Scenario .. raw:: html
customer_id first_name last_name full_name email city state load_timestamp
1 John Doe John Doe john.doe@example.com Melbourne VIC 2023-01-01 10:00
2 Jane Smith Jane Smith jane.smith@example.com Melbourne VIC 2023-01-01 10:00
1 John Doe John Doe jdoe@example.com Melbourne VIC 2023-01-02 10:00
2 Jane Smith Jane Smith jane.smith@example.com Perth WA 2023-01-02 10:00
3 Alice Green alice.green@example.com alice.green@example.com Sydney NSW 2023-01-02 10:00
4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS 2023-01-02 10:00
1 John Doe John Doe jdoe@example.com Brisbane QLD 2023-01-02 10:00
- SCD1 Scenario .. raw:: html
customer_id first_name last_name full_name email city state
1 John Doe John Doe jdoe@example.com Brisbane QLD
2 Jane Smith Jane Smith jane.smith@example.com Perth WA
3 Alice Green alice.green@example.com alice.green@example.com Sydney NSW
4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS
- SCD2 Scenario .. raw:: html
customer_id first_name last_name full_name email city state _START_AT _END_AT
1 John Doe John Doe jdoe@example.com Brisbane QLD 2023-01-03 10:00 NULL
1 John Doe John Doe jdoe@example.com Melbourne VIC 2023-01-02 10:00 2023-01-03 10:00
1 John Doe John Doe john.doe@example.com Melbourne VIC 2023-01-01 10:00 2023-01-03 10:00
2 Jane Smith Jane Smith jane.smith@example.com Perth WA 2023-01-02 10:00 NULL
2 Jane Smith Jane Smith jane.smith@example.com Melbourne VIC 2023-01-01 10:00 2023-01-02 10:00
3 Alice Green Alice Green alice.green@example.com Sydney NSW 2023-01-01 10:00 NULL
4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS 2023-01-02 10:00 NULL