Multi-Source Streaming

Description

Suitable for multi-source streaming with basic transformations. Use when:

  • You need to stream multiple tables in a single target table via a basic transform.

  • The source tables share common business keys.

  • You only need to perform basic single row transforms.

Layers: Generally Silver

Models:

  • 3NF such as ODS, Inmon and Enterprise Models

  • Data Vault

Data Flow Components:

Multi-Source Streaming - Base

No.

Component

Description

M / O

1

Input Views

Input views are created over each streaming source table (as many as required). All source tables must share a common set of PK’s and must contain a sequence by column of the same data type. These views can optionally read from CDF if the source tables are CDF enabled.

M

2

Append Flows

Append flows load the rows from each streaming source table into a staging table.

M

3

Staging Append Only Table

A streaming append only table, the schema of which consists of the common primary keys, sequence by and the data columns returned by each input view.

M

4

Change Flow

A single change flow loads the data into the staging merge table. It essentially merges and dedupes all the rows on the common PK’s.

M

5

Staging Merge Table

A streaming table, the schema of which consists of the common primary keys, sequence by and the data columns returned by each input view. CDF is enabled on this table.

M

6

Final CDF View

A view over the staging merge table that reads a stream from the merge tables change data feed.

M

7

Final Transform View

A view that applies a SQL transform (SELECT or CTE) to the data returned by the Final CDF View. This is optional and not required if no transformation needs to be applied. If you don’t have a transform requirement you can omit the transform view. You may for example only need to specify which columns you want or perform a basic column renaming, which you can do in the Final CDF View (component 6).

O

8

Append or Change Flow

An Append Flow (for transactional or fact based target tables) or an SCD1/2 Change Flow that loads the data into the final target table.

M

9

Target Table

A streaming table, the schema of which is specified in the dataflowspec. This table is the final target table for the given flow.

M

* M / O: Mandatory or Optional.

Feature Support

Supported

Not Supported

  • Append Only & SCD 1/2

  • Basic transforms such as:

    • Data type conversion

    • Concatenation

    • Single row calculations

    • Formatting

  • Cleansing & Data Quality Rules

  • Conditionals and calculations (single row) across multiple source tables

  • Complex transforms such as aggregations

  • Joins

  • Window By

Considerations and Limitations

Important

  • All source tables must share the same business keys. The column names do not need to be the same in the sources, but the keys must be conceptually the same.

  • In SCD 2 scenarios, a new version of a row will be generated any time data changes in any of the source streams. This will be particularly noticeable when you have late arriving records across streams and will lead to more row versions than normally expected.

Sample

  • Bundle: dlt_framework/src/samples/silver_sample

  • Sample: dlt_framework/src/samples/silver_sample/src/dataflows/customer_p4

Example Data Flow

Note

In the below example data flow, customer ID 4 demonstrates the behaviour for late arriving records in streaming sources. This ultimately means you will potentially have more versions for a row in SCD2 scenarios. This needs to be weighed against your requirements and will either be acceptable or not. If this is not acceptable, please refer to the stream-static patterns below.

Day 1 Load

  • Source Tables (Append-Only)

    CUSTOMER

    customer_id

    first_name

    last_name

    email

    load_timestamp

    1

    John

    Doe

    john.doe@example.com

    2023-01-01 10:00

    2

    Jane

    Smith

    jane.smith@example.com

    2023-01-01 10:00

    CUSTOMER_ADDRESS

    customer_id

    city

    state

    load_timestamp

    1

    Melbourne

    VIC

    2023-01-01 10:00

    2

    Melbourne

    VIC

    2023-01-01 10:00

    4

    Hobart

    TAS

    2023-01-01 10:00

  • Staging Table (stg_source_1_appnd)

customer_id

first_name

last_name

email

city

state

load_timestamp

1

John

Doe

john.doe@example.com

NULL

NULL

2023-01-01 10:00

2

Jane

Smith

jane.smith@example.com

NULL

NULL

2023-01-01 10:00

1

NULL

NULL

NULL

Melbourne

VIC

2023-01-01 10:00

2

NULL

NULL

NULL

Melbourne

VIC

2023-01-01 10:00

4

NULL

NULL

NULL

Hobart

TAS

2023-01-01 10:00

  • Staging Table (stg_source_1_mrg)

customer_id

first_name

last_name

email

city

state

_START_AT

_END_AT

1

John

Doe

john.doe@example.com

Melbourne

VIC

2023-01-01 10:00

NULL

2

Jane

Smith

jane.smith@example.com

Melbourne

VIC

2023-01-01 10:00

NULL

4

NULL

NULL

NULL

Hobart

TAS

2023-01-01 10:00

NULL

  • Target Table

    • Append-Only Scenario

    customer_id

    first_name

    last_name

    full_name

    email

    city

    state

    load_timestamp

    1

    John

    Doe

    John Doe

    john.doe@example.com

    Melbourne

    VIC

    2023-01-01 10:00

    2

    Jane

    Smith

    Jane Smith

    jane.smith@example.com

    Melbourne

    VIC

    2023-01-01 10:00

    4

    NULL

    NULL

    NULL

    NULL

    Hobart

    TAS

    2023-01-01 10:00

    • SCD1 Scenario

    customer_id

    first_name

    last_name

    full_name

    email

    city

    state

    1

    John

    Doe

    John Doe

    john.doe@example.com

    Melbourne

    VIC

    2

    Jane

    Smith

    Jane Smith

    jane.smith@example.com

    Melbourne

    VIC

    4

    NULL

    NULL

    NULL

    NULL

    Hobart

    TAS

    • SCD2 Scenario

    customer_id

    first_name

    last_name

    full_name

    email

    city

    state

    _START_AT

    _END_AT

    1

    John

    Doe

    John Doe

    john.doe@example.com

    Melbourne

    VIC

    2023-01-01 10:00

    NULL

    2

    Jane

    Smith

    Jane Smith

    jane.smith@example.com

    Melbourne

    VIC

    2023-01-01 10:00

    NULL

    4

    NULL

    NULL

    NULL

    NULL

    Hobart

    TAS

    2023-01-01 10:00

    NULL

Day 2 Load

  • Source Tables (Append-Only)

    CUSTOMER

    customer_id first_name last_name email load_timestamp
    1 John Doe john.doe@example.com 2023-01-01 10:00
    2 Jane Smith jane.smith@example.com 2023-01-01 10:00
    1 John Doe jdoe@example.com 2023-01-02 10:00
    3 Alice Green alice.green@example.com 2023-01-02 10:00
    4 Joe Bloggs joe.bloggs@example.com 2023-01-02 10:00

    CUSTOMER_ADDRESS

    customer_id city state load_timestamp
    1 Melbourne VIC 2023-01-01 10:00
    2 Melbourne VIC 2023-01-01 10:00
    4 Hobart TAS 2023-01-01 10:00
    2 Perth WA 2023-01-02 10:00
    3 Sydney NSW 2023-01-02 10:00
  • Staging Table (stg_source_1_appnd)

    customer_id first_name last_name email city state load_timestamp
    1 John Doe john.doe@example.com NULL NULL 2023-01-01 10:00
    2 Jane Smith jane.smith@example.com NULL NULL 2023-01-01 10:00
    1 NULL NULL NULL Melbourne VIC 2023-01-01 10:00
    2 NULL NULL NULL Melbourne VIC 2023-01-01 10:00
    4 NULL NULL NULL Hobart TAS 2023-01-01 10:00
    1 John Doe jdoe@example.com NULL NULL 2023-01-02 10:00
    3 Alice Green alice.green@example.com NULL NULL 2023-01-02 10:00
    2 NULL NULL NULL Perth WA 2023-01-02 10:00
    3 NULL NULL NULL Sydney NSW 2023-01-02 10:00
    4 Joe Bloggs joe.bloggs@example.com NULL NULL 2023-01-02 10:00
  • Staging Table (stg_source_1_mrg)

    customer_id first_name last_name email city state _START_AT _END_AT
    1 John Doe jdoe@example.com Melbourne VIC 2023-01-02 10:00 NULL
    1 John Doe john.doe@example.com Melbourne VIC 2023-01-01 10:00 2023-01-02 10:00
    2 Jane Smith jane.smith@example.com Perth WA 2023-01-02 10:00 NULL
    2 Jane Smith jane.smith@example.com Melbourne VIC 2023-01-01 10:00 2023-01-02 10:00
    3 Alice Green alice.green@example.com Sydney NSW 2023-01-01 10:00 NULL
    4 Joe Bloggs joe.bloggs@example.com Hobart TAS 2023-01-02 10:00 NULL
    4 NULL NULL NULL Hobart TAS 2023-01-01 10:00 2023-01-02 10:00
  • Target Table

    • Append-Only Scenario

      customer_id first_name last_name full_name email city state load_timestamp
      1 John Doe John Doe john.doe@example.com Melbourne VIC 2023-01-01 10:00
      2 Jane Smith Jane Smith jane.smith@example.com Melbourne VIC 2023-01-01 10:00
      4 NULL NULL NULL NULL Hobart TAS 2023-01-01 10:00
      1 John Doe John Doe jdoe@example.com Melbourne VIC 2023-01-02 10:00
      2 Jane Smith Jane Smith jane.smith@example.com Perth WA 2023-01-02 10:00
      3 Alice Green alice.green@example.com alice.green@example.com Sydney NSW 2023-01-02 10:00
      4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS 2023-01-02 10:00
    • SCD1 Scenario

      customer_id first_name last_name full_name email city state
      1 John Doe John Doe jdoe@example.com Melbourne VIC
      2 Jane Smith Jane Smith jane.smith@example.com Perth WA
      3 Alice Green alice.green@example.com alice.green@example.com Sydney NSW
      4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS
    • SCD2 Scenario

      customer_id first_name last_name full_name email city state _START_AT _END_AT
      1 John Doe John Doe jdoe@example.com Melbourne VIC 2023-01-02 10:00 NULL
      1 John Doe John Doe john.doe@example.com Melbourne VIC 2023-01-01 10:00 2023-01-02 10:00
      2 Jane Smith Jane Smith jane.smith@example.com Perth WA 2023-01-02 10:00 NULL
      2 Jane Smith Jane Smith jane.smith@example.com Melbourne VIC 2023-01-01 10:00 2023-01-02 10:00
      3 Alice Green Alice Green alice.green@example.com Sydney NSW 2023-01-01 10:00 NULL
      4 Joe Bloggs Joe Bloggs joe.bloggs@example.com Hobart TAS 2023-01-02 10:00 NULL
      4 NULL NULL NULL NULL Hobart TAS 2023-01-01 10:00 2023-01-02 10:00