Stream-Static - Streaming Data Warehouse
=========================================
Description
------------
Suitable for when you have a streaming table that you need to join to one or many additional static tables in order to derive your desired target data set, but you also want updates to the static tables to be reflected as they occur.
Use when:
- You want to join multiple streaming tables.
- You want changes in any/all tables to be updated as they occur.
- You only need to perform basic single row transforms.
**Layers:**
- Silver
- Gold (no complex transforms or aggregations)
**Models:**
- 3NF such as ODS, Inmon and Enterprise Models
- Data Vault
- Dimensional: dimensions and basic transactional facts
**Data Flow Components:**
.. image:: images/stream_static_dwh.png
:target: _images/stream_static_dwh.png
:alt: Stream Static - Streaming DWH
.. list-table::
:header-rows: 1
:widths: 5 15 60 20
* - No.
- Component
- Description
- M / O
* - 1
- Input Views
- Input views are created over each streaming source table (as many as required). These views need only return:
- the columns required for the necessary joins
- a Sequence By column if this is an SCD 1/2 use case
These views can optionally read from CDF if the source tables are CDF enabled.
- M
* - 2
- Append Flows
- Append flows load only the PK's and Sequnce By columns into a staging table.
- M
* - 3
- Staging Append Only Table
- A streaming append only table, the schema of which consists of only the primary keys and sequence by columns returned by each input view.
- M
* - 4
- Change Flow
- A single change flow loads the data into the staging merge table. It essentially merges and dedupes all the rows.
- M
* - 5
- Staging Merge Table
- A streaming table, the schema of which consists of only the primary keys and sequence by columns. CDF is enabled on this table.
- M
* - 6
- Stream-static Join View
- A view that implements the frameworks delta-join source type. It uses the previous staging table as the driving streaming table, reading from its CDF feed, and performs static joins to ALL the tables defined in the join.
- M
* - 7
- Final Transform View
- A view that applies a SQL transform (SELECT or CTE) to the data returned by the Stream-static Join View. This is optional and not required if no transformation needs to be applied. If you don't have a transform requirement you can omit the transform view. You may for example only need to specify which columns you want or perform a basic column renaming, which you can do in the Stream-static Join View (component 6).
- O
* - 8
- Append or Change Flow
- An Append Flow (for transactional or fact based target tables) or an SCD1/2 Change Flow that loads the data into the final target table.
- M
* - 9
- Target Table
- A streaming table, the schema of which is specified in the dataflowspec. This table is the final target table for the given flow.
- M
Feature Support
----------------
.. list-table::
:header-rows: 1
:widths: 50 50
* - Supported
- Not Supported
* - - Append Only & SCD 1/2
- Basic transforms such as:
- Data type conversion
- Concatenation
- Single row calculations
- Formatting
- Cleansing & Data Quality Rules
- Conditionals and calculations (single row) across multiple source tables
- Joins
- - Complex transforms such as aggregations
- Window By
Considerations and Limitations
-------------------------------
- More complex to implement than the Stream-Static Basic pattern but allows for true streaming joins.
Sample
------
- Bundle: ``dlt_framework/src/samples/silver_sample``
- Sample: ``dlt_framework/src/samples/silver_sample/src/dataflows/stream_static_p7``
Example Data Flow
------------------
Day 1 Load
~~~~~~~~~~
- **Source Tables (Append-Only)**
CUSTOMER
.. list-table::
:header-rows: 1
:widths: 15 15 15 25 30
* - customer_id
- first_name
- last_name
- email
- load_timestamp
* - 1
- John
- Doe
- john.doe@example.com
- 2023-01-01 10:00
* - 2
- Jane
- Smith
- jane.smith@example.com
- 2023-01-01 10:00
CUSTOMER_ADDRESS
.. list-table::
:header-rows: 1
:widths: 15 15 15 30
* - customer_id
- city
- state
- load_timestamp
* - 1
- Melbourne
- VIC
- 2023-01-01 10:00
* - 2
- Melbourne
- VIC
- 2023-01-01 10:00
* - 4
- Hobart
- TAS
- 2023-01-01 10:00
- **Staging Table (stg_source_1_appnd)**
.. list-table::
:header-rows: 1
:widths: 30 70
* - customer_id
- load_timestamp
* - 1
- 2023-01-01 10:00
* - 2
- 2023-01-01 10:00
* - 1
- 2023-01-01 10:00
* - 2
- 2023-01-01 10:00
* - 4
- 2023-01-01 10:00
- **Target Table**
- Append-Only Scenario
.. list-table::
:header-rows: 1
:widths: 12 12 12 12 20 12 10 10
* - customer_id
- first_name
- last_name
- full_name
- email
- city
- state
- load_timestamp
* - 1
- John
- Doe
- John Doe
- john.doe@example.com
- Melbourne
- VIC
- 2023-01-01 10:00
* - 2
- Jane
- Smith
- Jane Smith
- jane.smith@example.com
- Melbourne
- VIC
- 2023-01-01 10:00
- SCD1 Scenario
.. list-table::
:header-rows: 1
:widths: 12 12 12 12 20 12 10
* - customer_id
- first_name
- last_name
- full_name
- email
- city
- state
* - 1
- John
- Doe
- John Doe
- john.doe@example.com
- Melbourne
- VIC
* - 2
- Jane
- Smith
- Jane Smith
- jane.smith@example.com
- Melbourne
- VIC
- SCD2 Scenario
.. list-table::
:header-rows: 1
:widths: 12 12 12 12 20 12 10 15 15
* - customer_id
- first_name
- last_name
- full_name
- email
- city
- state
- _START_AT
- _END_AT
* - 1
- John
- Doe
- John Doe
- john.doe@example.com
- Melbourne
- VIC
- 2023-01-01 10:00
- NULL
* - 2
- Jane
- Smith
- Jane Smith
- jane.smith@example.com
- Melbourne
- VIC
- 2023-01-01 10:00
- NULL
Day 2 Load
~~~~~~~~~~~
- **Source Tables (Append-Only)**
CUSTOMER
.. raw:: html
| customer_id | first_name | last_name | email | load_timestamp |
| 1 | John | Doe | john.doe@example.com | 2023-01-01 10:00 |
| 2 | Jane | Smith | jane.smith@example.com | 2023-01-01 10:00 |
| 1 | John | Doe | jdoe@example.com | 2023-01-02 10:00 |
| 3 | Alice | Green | alice.green@example.com | 2023-01-02 10:00 |
| 4 | Joe | Bloggs | joe.bloggs@example.com | 2023-01-02 10:00 |
CUSTOMER_ADDRESS
.. raw:: html
| customer_id | city | state | load_timestamp |
| 1 | Melbourne | VIC | 2023-01-01 10:00 |
| 2 | Melbourne | VIC | 2023-01-01 10:00 |
| 4 | Hobart | TAS | 2023-01-01 10:00 |
| 2 | Perth | WA | 2023-01-02 10:00 |
| 3 | Sydney | NSW | 2023-01-02 10:00 |
- **Staging Table (stg_source_1_appnd)**
.. raw:: html
| customer_id | load_timestamp |
| 1 | 2023-01-01 10:00 |
| 2 | 2023-01-01 10:00 |
| 1 | 2023-01-01 10:00 |
| 2 | 2023-01-01 10:00 |
| 4 | 2023-01-01 10:00 |
| 1 | 2023-01-02 10:00 |
| 3 | 2023-01-02 10:00 |
| 2 | 2023-01-02 10:00 |
| 3 | 2023-01-02 10:00 |
| 4 | 2023-01-02 10:00 |
- **Staging Table (stg_source_1_mrg)**
.. raw:: html
| customer_id | _START_AT | _END_AT |
| 1 | 2023-01-02 10:00 | NULL |
| 1 | 2023-01-01 10:00 | 2023-01-02 10:00 |
| 2 | 2023-01-02 10:00 | NULL |
| 2 | 2023-01-01 10:00 | 2023-01-02 10:00 |
| 3 | 2023-01-01 10:00 | NULL |
| 4 | 2023-01-02 10:00 | NULL |
| 4 | 2023-01-01 10:00 | 2023-01-02 10:00 |
- **Target Table**
- Append-Only Scenario
.. raw:: html
| customer_id | first_name | last_name | full_name | email | city | state | load_timestamp |
| 1 | John | Doe | John Doe | john.doe@example.com | Melbourne | VIC | 2023-01-01 10:00 |
| 2 | Jane | Smith | Jane Smith | jane.smith@example.com | Melbourne | VIC | 2023-01-01 10:00 |
| 1 | John | Doe | John Doe | jdoe@example.com | Melbourne | VIC | 2023-01-02 10:00 |
| 2 | Jane | Smith | Jane Smith | jane.smith@example.com | Perth | WA | 2023-01-02 10:00 |
| 3 | Alice | Green | alice.green@example.com | alice.green@example.com | Sydney | NSW | 2023-01-02 10:00 |
| 4 | Joe | Bloggs | Joe Bloggs | joe.bloggs@example.com | Hobart | TAS | 2023-01-02 10:00 |
- SCD1 Scenario
.. raw:: html
| customer_id | first_name | last_name | full_name | email | city | state |
| 1 | John | Doe | John Doe | jdoe@example.com | Melbourne | VIC |
| 2 | Jane | Smith | Jane Smith | jane.smith@example.com | Perth | WA |
| 3 | Alice | Green | alice.green@example.com | alice.green@example.com | Sydney | NSW |
| 4 | Joe | Bloggs | Joe Bloggs | joe.bloggs@example.com | Hobart | TAS |
- SCD2 Scenario
.. raw:: html
| customer_id | first_name | last_name | full_name | email | city | state | _START_AT | _END_AT |
| 1 | John | Doe | John Doe | jdoe@example.com | Melbourne | VIC | 2023-01-02 10:00 | NULL |
| 1 | John | Doe | John Doe | john.doe@example.com | Melbourne | VIC | 2023-01-01 10:00 | 2023-01-02 10:00 |
| 2 | Jane | Smith | Jane Smith | jane.smith@example.com | Perth | WA | 2023-01-02 10:00 | NULL |
| 2 | Jane | Smith | Jane Smith | jane.smith@example.com | Melbourne | VIC | 2023-01-01 10:00 | 2023-01-02 10:00 |
| 3 | Alice | Green | Alice Green | alice.green@example.com | Sydney | NSW | 2023-01-01 10:00 | NULL |
| 4 | Joe | Bloggs | Joe Bloggs | joe.bloggs@example.com | Hobart | TAS | 2023-01-02 10:00 | NULL |
Day 3 Load
~~~~~~~~~~~
- **Source Tables (Append-Only)**
CUSTOMER
.. raw:: html
| customer_id | first_name | last_name | email | load_timestamp |
| 1 | John | Doe | john.doe@example.com | 2023-01-01 10:00 |
| 2 | Jane | Smith | jane.smith@example.com | 2023-01-01 10:00 |
| 1 | John | Doe | jdoe@example.com | 2023-01-02 10:00 |
| 3 | Alice | Green | alice.green@example.com | 2023-01-02 10:00 |
| 4 | Joe | Bloggs | joe.bloggs@example.com | 2023-01-02 10:00 |
CUSTOMER_ADDRESS
.. raw:: html
| customer_id | city | state | load_timestamp |
| 1 | Melbourne | VIC | 2023-01-01 10:00 |
| 2 | Melbourne | VIC | 2023-01-01 10:00 |
| 4 | Hobart | TAS | 2023-01-01 10:00 |
| 2 | Perth | WA | 2023-01-02 10:00 |
| 3 | Sydney | NSW | 2023-01-02 10:00 |
| 1 | Brisbane | QLD | 2023-01-03 10:00 |
- **Staging Table (stg_source_1_appnd)**
.. raw:: html
| customer_id | load_timestamp |
| 1 | 2023-01-01 10:00 |
| 2 | 2023-01-01 10:00 |
| 1 | 2023-01-01 10:00 |
| 2 | 2023-01-01 10:00 |
| 4 | 2023-01-01 10:00 |
| 1 | 2023-01-02 10:00 |
| 3 | 2023-01-02 10:00 |
| 2 | 2023-01-02 10:00 |
| 3 | 2023-01-02 10:00 |
| 1 | 2023-01-03 10:00 |
- **Staging Table (stg_source_1_mrg)**
.. raw:: html
| customer_id | _START_AT | _END_AT |
| 1 | 2023-01-03 10:00 | NULL |
| 1 | 2023-01-02 10:00 | 2023-01-03 10:00 |
| 1 | 2023-01-01 10:00 | 2023-01-02 10:00 |
| 2 | 2023-01-02 10:00 | NULL |
| 2 | 2023-01-01 10:00 | 2023-01-02 10:00 |
| 3 | 2023-01-01 10:00 | NULL |
| 4 | 2023-01-02 10:00 | NULL |
| 4 | 2023-01-01 10:00 | 2023-01-02 10:00 |
- **Target Table**
- Append-Only Scenario
.. raw:: html
| customer_id | first_name | last_name | full_name | email | city | state | load_timestamp |
| 1 | John | Doe | John Doe | john.doe@example.com | Melbourne | VIC | 2023-01-01 10:00 |
| 2 | Jane | Smith | Jane Smith | jane.smith@example.com | Melbourne | VIC | 2023-01-01 10:00 |
| 1 | John | Doe | John Doe | jdoe@example.com | Melbourne | VIC | 2023-01-02 10:00 |
| 2 | Jane | Smith | Jane Smith | jane.smith@example.com | Perth | WA | 2023-01-02 10:00 |
| 3 | Alice | Green | alice.green@example.com | alice.green@example.com | Sydney | NSW | 2023-01-02 10:00 |
| 4 | Joe | Bloggs | Joe Bloggs | joe.bloggs@example.com | Hobart | TAS | 2023-01-02 10:00 |
| 1 | John | Doe | John Doe | jdoe@example.com | Brisbane | QLD | 2023-01-02 10:00 |
- SCD1 Scenario
.. raw:: html
| customer_id | first_name | last_name | full_name | email | city | state |
| 1 | John | Doe | John Doe | jdoe@example.com | Brisbane | QLD |
| 2 | Jane | Smith | Jane Smith | jane.smith@example.com | Perth | WA |
| 3 | Alice | Green | alice.green@example.com | alice.green@example.com | Sydney | NSW |
| 4 | Joe | Bloggs | Joe Bloggs | joe.bloggs@example.com | Hobart | TAS |
- SCD2 Scenario
.. raw:: html
| customer_id | first_name | last_name | full_name | email | city | state | _START_AT | _END_AT |
| 1 | John | Doe | John Doe | jdoe@example.com | Brisbane | QLD | 2023-01-03 10:00 | NULL |
| 1 | John | Doe | John Doe | jdoe@example.com | Melbourne | VIC | 2023-01-02 10:00 | 2023-01-03 10:00 |
| 1 | John | Doe | John Doe | john.doe@example.com | Melbourne | VIC | 2023-01-01 10:00 | 2023-01-03 10:00 |
| 2 | Jane | Smith | Jane Smith | jane.smith@example.com | Perth | WA | 2023-01-02 10:00 | NULL |
| 2 | Jane | Smith | Jane Smith | jane.smith@example.com | Melbourne | VIC | 2023-01-01 10:00 | 2023-01-02 10:00 |
| 3 | Alice | Green | Alice Green | alice.green@example.com | Sydney | NSW | 2023-01-01 10:00 | NULL |
| 4 | Joe | Bloggs | Joe Bloggs | joe.bloggs@example.com | Hobart | TAS | 2023-01-02 10:00 | NULL |