Data Flow Spec - Source Details

The sourceDetails object can be any of the following, based on the sourceType:

Batch Files

The sourceBatchFiles object contains the following properties:

Property

Type

Description

format

string

The format of the batch files. Supported: [“csv”, “json”, “parquet”, “text”, “xml”]

path

string

The path to the batch files.

readerOptions

object

Options for reading the batch files. See definitions_sources.json schema for supported options.

selectExp (optional)

array

An array of select expressions. Items: string

whereClause (optional)

array

An array of where clauses. Items: string

schemaPath (optional)

string

The schema path.

pythonTransform (optional)

string

The Python transform configuration. See Python Transform Object for supported options.

Cloud Files

The sourceCloudFiles object contains the following properties:

Property

Type

Description

path

string

The path to the cloud files.

readerOptions

object

Options for reading the cloud files. See definitions_sources.json schema for supported options.

selectExp (optional)

array

An array of select expressions. Items: string

whereClause (optional)

array

An array of where clauses. Items: string

schemaPath (optional)

string

The schema path.

pythonTransform (optional)

string

The Python transform configuration. See Python Transform Object for supported options.

Delta

The sourceDelta object contains the following properties:

Property

Type

Description

database

string

The database name.

table

string

The table name.

cdfEnabled

boolean

Whether change data feed (CDF) is enabled.

tablePath (optional)

string

The table path.

selectExp (optional)

array

An array of select expressions. Items: string

whereClause (optional)

array

An array of where clauses. Items: string

schemaPath (optional)

string

The schema path.

readerOptions (optional)

object

Additional reader options. See definitions_sources.json schema for supported options.

pythonTransform (optional)

string

The Python transform configuration. See Python Transform Object for supported options.

startingVersionFromDLTSetup (optional)

boolean

Whether to automatically set reader option ‘startingVersion’ to the last time the SDP Setup operation was run on the source table. This helps to ensure CDF is read from the last time source table was reset (full refresh).

Delta Join

The sourceDeltaJoin object contains the following properties:

Sources:

Property

Type

Description

database

string

The database name.

table

string

The table name.

alias

string

The alias for the table.

joinMode

string

The join mode. Supported: [“stream”, “static”], Default: “stream”

cdfEnabled

boolean

Whether change data feed (CDF) is enabled.

tablePath (optional)

string

The table path.

selectExp (optional)

array

An array of select expressions. Items: string

whereClause (optional)

array

An array of where clauses. Items: string

schemaPath (optional)

string

The schema path.

readerOptions (optional)

object

Additional reader options. See definitions_sources.json schema for supported options.

pythonTransform (optional)

string

The Python transform configuration. See Python Transform Object for supported options.

Joins:

Property

Type

Description

joinType

string

The join type. Supported: [“left”, “inner”], Default: “left”

condition

string

The join condition.

Additional Properties:

Property

Type

Description

selectExp (optional)

array

An array of select expressions. Items: string

whereClause (optional)

array

An array of where clauses. Items: string

pythonTransform (optional)

string

The Python transform configuration. See Python Transform Object for supported options.

Kafka

The sourceKafkaReader object contains the following properties:

Property

Type

Description

readerOptions

object

Options for reading from Kafka. See definitions_sources.json schema for supported options.

selectExp (optional)

array

An array of select expressions. Items: string

whereClause (optional)

array

An array of where clauses. Items: string

schemaPath (optional)

string

The schema path.

pythonTransform (optional)

string

The Python transform configuration. See Python Transform Object for supported options.

Kafka SQL

In progress

Python

The sourcePython object contains the following properties:

Property

Type

Description

functionPath (optional)

string

The path to the Python file, which should live in the python_functions subdirectory.

pythonModule (optional)

string

The module to import the Python function from.

tokens

object

A dictionary of tokens that will be passed to the Python function. This allows you to pass in substitution variables from the data flow spec.

Important

  • You must select one of functionPath or pythonModule.

SQL

The sourceSql object contains the following properties:

Property

Type

Description

sqlPath (optional)

string

The path to the SQL file, which should live in the dml subdirectory.

sqlStatement (optional)

string

The SQL statement to execute.

Important

  • While the sqlPath and sqlStatement properties are optional you must select one.

  • If both sqlPath and sqlStatement are provided, sqlStatement will take precedence.

Python Transform Object

The pythonTransform object can be used to specify a Python transform function to be applied to the dataframe post read. It can contain the following properties:

Property

Type

Description

functionPath (optional)

string

The path to the Python file, which should live in the python_functions subdirectory.

module (optional)

string

The module to import the Python function from.

tokens (optional)

object

A dictionary of tokens that will be passed to the Python function. This allows you to pass in substitution variables from the data flow spec.

Important

  • You must select one of functionPath or module.