Stream-Static - Streaming Data Warehouse
Description
Suitable for when you have a streaming table that you need to join to one or many additional static tables in order to derive your desired target data set, but you also want updates to the static tables to be reflected as they occur.
Use when:
You want to join multiple streaming tables.
You want changes in any/all tables to be updated as they occur.
You only need to perform basic single row transforms.
Layers:
Silver
Gold (no complex transforms or aggregations)
Models:
3NF such as ODS, Inmon and Enterprise Models
Data Vault
Dimensional: dimensions and basic transactional facts
Data Flow Components:
No. |
Component |
Description |
M / O |
|---|---|---|---|
1 |
Input Views |
Input views are created over each streaming source table (as many as required). These views need only return:
These views can optionally read from CDF if the source tables are CDF enabled. |
M |
2 |
Append Flows |
Append flows load only the PK’s and Sequnce By columns into a staging table. |
M |
3 |
Staging Append Only Table |
A streaming append only table, the schema of which consists of only the primary keys and sequence by columns returned by each input view. |
M |
4 |
Change Flow |
A single change flow loads the data into the staging merge table. It essentially merges and dedupes all the rows. |
M |
5 |
Staging Merge Table |
A streaming table, the schema of which consists of only the primary keys and sequence by columns. CDF is enabled on this table. |
M |
6 |
Stream-static Join View |
A view that implements the frameworks delta-join source type. It uses the previous staging table as the driving streaming table, reading from its CDF feed, and performs static joins to ALL the tables defined in the join. |
M |
7 |
Final Transform View |
A view that applies a SQL transform (SELECT or CTE) to the data returned by the Stream-static Join View. This is optional and not required if no transformation needs to be applied. If you don’t have a transform requirement you can omit the transform view. You may for example only need to specify which columns you want or perform a basic column renaming, which you can do in the Stream-static Join View (component 6). |
O |
8 |
Append or Change Flow |
An Append Flow (for transactional or fact based target tables) or an SCD1/2 Change Flow that loads the data into the final target table. |
M |
9 |
Target Table |
A streaming table, the schema of which is specified in the dataflowspec. This table is the final target table for the given flow. |
M |
Feature Support
Supported |
Not Supported |
|---|---|
|
|
Considerations and Limitations
More complex to implement than the Stream-Static Basic pattern but allows for true streaming joins.
Sample
Bundle:
dlt_framework/src/samples/silver_sampleSample:
dlt_framework/src/samples/silver_sample/src/dataflows/stream_static_p7
Example Data Flow
Day 1 Load
Source Tables (Append-Only)
CUSTOMER
customer_id
first_name
last_name
email
load_timestamp
1
John
Doe
2023-01-01 10:00
2
Jane
Smith
2023-01-01 10:00
CUSTOMER_ADDRESS
customer_id
city
state
load_timestamp
1
Melbourne
VIC
2023-01-01 10:00
2
Melbourne
VIC
2023-01-01 10:00
4
Hobart
TAS
2023-01-01 10:00
Staging Table (stg_source_1_appnd)
customer_id
load_timestamp
1
2023-01-01 10:00
2
2023-01-01 10:00
1
2023-01-01 10:00
2
2023-01-01 10:00
4
2023-01-01 10:00
Target Table
Append-Only Scenario
customer_id
first_name
last_name
full_name
email
city
state
load_timestamp
1
John
Doe
John Doe
Melbourne
VIC
2023-01-01 10:00
2
Jane
Smith
Jane Smith
Melbourne
VIC
2023-01-01 10:00
SCD1 Scenario
customer_id
first_name
last_name
full_name
email
city
state
1
John
Doe
John Doe
Melbourne
VIC
2
Jane
Smith
Jane Smith
Melbourne
VIC
SCD2 Scenario
customer_id
first_name
last_name
full_name
email
city
state
_START_AT
_END_AT
1
John
Doe
John Doe
Melbourne
VIC
2023-01-01 10:00
NULL
2
Jane
Smith
Jane Smith
Melbourne
VIC
2023-01-01 10:00
NULL
Day 2 Load
Source Tables (Append-Only)
CUSTOMER
customer_id first_name last_name email load_timestamp 1 John Doe john.doe@example.com 2023-01-01 10:00 2 Jane Smith jane.smith@example.com 2023-01-01 10:00 1 John Doe jdoe@example.com 2023-01-02 10:00 3 Alice Green alice.green@example.com 2023-01-02 10:00 4 Joe Bloggs joe.bloggs@example.com 2023-01-02 10:00 CUSTOMER_ADDRESS
customer_id city state load_timestamp 1 Melbourne VIC 2023-01-01 10:00 2 Melbourne VIC 2023-01-01 10:00 4 Hobart TAS 2023-01-01 10:00 2 Perth WA 2023-01-02 10:00 3 Sydney NSW 2023-01-02 10:00 Staging Table (stg_source_1_appnd)
customer_id load_timestamp 1 2023-01-01 10:00 2 2023-01-01 10:00 1 2023-01-01 10:00 2 2023-01-01 10:00 4 2023-01-01 10:00 1 2023-01-02 10:00 3 2023-01-02 10:00 2 2023-01-02 10:00 3 2023-01-02 10:00 4 2023-01-02 10:00 Staging Table (stg_source_1_mrg)
customer_id _START_AT _END_AT 1 2023-01-02 10:00 NULL 1 2023-01-01 10:00 2023-01-02 10:00 2 2023-01-02 10:00 NULL 2 2023-01-01 10:00 2023-01-02 10:00 3 2023-01-01 10:00 NULL 4 2023-01-02 10:00 NULL 4 2023-01-01 10:00 2023-01-02 10:00 Target Table
Append-Only Scenario
customer_id first_name last_name full_name email city state load_timestamp 1 John Doe John Doe john.doe@example.com Melbourne VIC 2023-01-01 10:00 2 Jane Smith Jane Smith jane.smith@example.com Melbourne VIC 2023-01-01 10:00 1 John Doe John Doe jdoe@example.com Melbourne VIC 2023-01-02 10:00 2 Jane Smith Jane Smith jane.smith@example.com Perth WA 2023-01-02 10:00 3 Alice Green alice.green@example.com alice.green@example.com Sydney NSW 2023-01-02 10:00 4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS 2023-01-02 10:00 SCD1 Scenario
customer_id first_name last_name full_name email city state 1 John Doe John Doe jdoe@example.com Melbourne VIC 2 Jane Smith Jane Smith jane.smith@example.com Perth WA 3 Alice Green alice.green@example.com alice.green@example.com Sydney NSW 4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS SCD2 Scenario
customer_id first_name last_name full_name email city state _START_AT _END_AT 1 John Doe John Doe jdoe@example.com Melbourne VIC 2023-01-02 10:00 NULL 1 John Doe John Doe john.doe@example.com Melbourne VIC 2023-01-01 10:00 2023-01-02 10:00 2 Jane Smith Jane Smith jane.smith@example.com Perth WA 2023-01-02 10:00 NULL 2 Jane Smith Jane Smith jane.smith@example.com Melbourne VIC 2023-01-01 10:00 2023-01-02 10:00 3 Alice Green Alice Green alice.green@example.com Sydney NSW 2023-01-01 10:00 NULL 4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS 2023-01-02 10:00 NULL
Day 3 Load
Source Tables (Append-Only)
CUSTOMER
customer_id first_name last_name email load_timestamp 1 John Doe john.doe@example.com 2023-01-01 10:00 2 Jane Smith jane.smith@example.com 2023-01-01 10:00 1 John Doe jdoe@example.com 2023-01-02 10:00 3 Alice Green alice.green@example.com 2023-01-02 10:00 4 Joe Bloggs joe.bloggs@example.com 2023-01-02 10:00 CUSTOMER_ADDRESS
customer_id city state load_timestamp 1 Melbourne VIC 2023-01-01 10:00 2 Melbourne VIC 2023-01-01 10:00 4 Hobart TAS 2023-01-01 10:00 2 Perth WA 2023-01-02 10:00 3 Sydney NSW 2023-01-02 10:00 1 Brisbane QLD 2023-01-03 10:00 Staging Table (stg_source_1_appnd)
customer_id load_timestamp 1 2023-01-01 10:00 2 2023-01-01 10:00 1 2023-01-01 10:00 2 2023-01-01 10:00 4 2023-01-01 10:00 1 2023-01-02 10:00 3 2023-01-02 10:00 2 2023-01-02 10:00 3 2023-01-02 10:00 1 2023-01-03 10:00 Staging Table (stg_source_1_mrg)
customer_id _START_AT _END_AT 1 2023-01-03 10:00 NULL 1 2023-01-02 10:00 2023-01-03 10:00 1 2023-01-01 10:00 2023-01-02 10:00 2 2023-01-02 10:00 NULL 2 2023-01-01 10:00 2023-01-02 10:00 3 2023-01-01 10:00 NULL 4 2023-01-02 10:00 NULL 4 2023-01-01 10:00 2023-01-02 10:00 Target Table
Append-Only Scenario
customer_id first_name last_name full_name email city state load_timestamp 1 John Doe John Doe john.doe@example.com Melbourne VIC 2023-01-01 10:00 2 Jane Smith Jane Smith jane.smith@example.com Melbourne VIC 2023-01-01 10:00 1 John Doe John Doe jdoe@example.com Melbourne VIC 2023-01-02 10:00 2 Jane Smith Jane Smith jane.smith@example.com Perth WA 2023-01-02 10:00 3 Alice Green alice.green@example.com alice.green@example.com Sydney NSW 2023-01-02 10:00 4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS 2023-01-02 10:00 1 John Doe John Doe jdoe@example.com Brisbane QLD 2023-01-02 10:00 SCD1 Scenario
customer_id first_name last_name full_name email city state 1 John Doe John Doe jdoe@example.com Brisbane QLD 2 Jane Smith Jane Smith jane.smith@example.com Perth WA 3 Alice Green alice.green@example.com alice.green@example.com Sydney NSW 4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS SCD2 Scenario
customer_id first_name last_name full_name email city state _START_AT _END_AT 1 John Doe John Doe jdoe@example.com Brisbane QLD 2023-01-03 10:00 NULL 1 John Doe John Doe jdoe@example.com Melbourne VIC 2023-01-02 10:00 2023-01-03 10:00 1 John Doe John Doe john.doe@example.com Melbourne VIC 2023-01-01 10:00 2023-01-03 10:00 2 Jane Smith Jane Smith jane.smith@example.com Perth WA 2023-01-02 10:00 NULL 2 Jane Smith Jane Smith jane.smith@example.com Melbourne VIC 2023-01-01 10:00 2023-01-02 10:00 3 Alice Green Alice Green alice.green@example.com Sydney NSW 2023-01-01 10:00 NULL 4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS 2023-01-02 10:00 NULL