Data Flow Spec - Source Details
The sourceDetails object can be any of the following, based on the sourceType:
Batch Files
The sourceBatchFiles object contains the following properties:
Property |
Type |
Description |
|---|---|---|
format |
|
The format of the batch files. Supported: [“csv”, “json”, “parquet”, “text”, “xml”] |
path |
|
The path to the batch files. |
readerOptions |
|
Options for reading the batch files. See definitions_sources.json schema for supported options. |
selectExp (optional) |
|
An array of select expressions. Items: |
whereClause (optional) |
|
An array of where clauses. Items: |
schemaPath (optional) |
|
The schema path. |
pythonTransform (optional) |
|
The Python transform configuration. See Python Transform Object for supported options. |
Cloud Files
The sourceCloudFiles object contains the following properties:
Property |
Type |
Description |
|---|---|---|
path |
|
The path to the cloud files. |
readerOptions |
|
Options for reading the cloud files. See definitions_sources.json schema for supported options. |
selectExp (optional) |
|
An array of select expressions. Items: |
whereClause (optional) |
|
An array of where clauses. Items: |
schemaPath (optional) |
|
The schema path. |
pythonTransform (optional) |
|
The Python transform configuration. See Python Transform Object for supported options. |
Delta
The sourceDelta object contains the following properties:
Property |
Type |
Description |
|---|---|---|
database |
|
The database name. |
table |
|
The table name. |
cdfEnabled |
|
Whether change data feed (CDF) is enabled. |
tablePath (optional) |
|
The table path. |
selectExp (optional) |
|
An array of select expressions. Items: |
whereClause (optional) |
|
An array of where clauses. Items: |
schemaPath (optional) |
|
The schema path. |
readerOptions (optional) |
|
Additional reader options. See definitions_sources.json schema for supported options. |
pythonTransform (optional) |
|
The Python transform configuration. See Python Transform Object for supported options. |
startingVersionFromDLTSetup (optional) |
|
Whether to automatically set reader option ‘startingVersion’ to the last time the SDP Setup operation was run on the source table. This helps to ensure CDF is read from the last time source table was reset (full refresh). |
Delta Join
The sourceDeltaJoin object contains the following properties:
Sources:
Property |
Type |
Description |
|---|---|---|
database |
|
The database name. |
table |
|
The table name. |
alias |
|
The alias for the table. |
joinMode |
|
The join mode. Supported: [“stream”, “static”], Default: “stream” |
cdfEnabled |
|
Whether change data feed (CDF) is enabled. |
tablePath (optional) |
|
The table path. |
selectExp (optional) |
|
An array of select expressions. Items: |
whereClause (optional) |
|
An array of where clauses. Items: |
schemaPath (optional) |
|
The schema path. |
readerOptions (optional) |
|
Additional reader options. See definitions_sources.json schema for supported options. |
pythonTransform (optional) |
|
The Python transform configuration. See Python Transform Object for supported options. |
Joins:
Property |
Type |
Description |
|---|---|---|
joinType |
|
The join type. Supported: [“left”, “inner”], Default: “left” |
condition |
|
The join condition. |
Additional Properties:
Property |
Type |
Description |
|---|---|---|
selectExp (optional) |
|
An array of select expressions. Items: |
whereClause (optional) |
|
An array of where clauses. Items: |
pythonTransform (optional) |
|
The Python transform configuration. See Python Transform Object for supported options. |
Kafka
The sourceKafkaReader object contains the following properties:
Property |
Type |
Description |
|---|---|---|
readerOptions |
|
Options for reading from Kafka. See definitions_sources.json schema for supported options. |
selectExp (optional) |
|
An array of select expressions. Items: |
whereClause (optional) |
|
An array of where clauses. Items: |
schemaPath (optional) |
|
The schema path. |
pythonTransform (optional) |
|
The Python transform configuration. See Python Transform Object for supported options. |
Kafka SQL
In progress
Python
The sourcePython object contains the following properties:
Property |
Type |
Description |
|---|---|---|
functionPath (optional) |
|
The path to the Python file, which should live in the python_functions subdirectory. |
pythonModule (optional) |
|
The module to import the Python function from. |
tokens |
|
A dictionary of tokens that will be passed to the Python function. This allows you to pass in substitution variables from the data flow spec. |
Important
You must select one of functionPath or pythonModule.
SQL
The sourceSql object contains the following properties:
Property |
Type |
Description |
|---|---|---|
sqlPath (optional) |
|
The path to the SQL file, which should live in the dml subdirectory. |
sqlStatement (optional) |
|
The SQL statement to execute. |
Important
While the sqlPath and sqlStatement properties are optional you must select one.
If both sqlPath and sqlStatement are provided, sqlStatement will take precedence.
Python Transform Object
The pythonTransform object can be used to specify a Python transform function to be applied to the dataframe post read. It can contain the following properties:
Property |
Type |
Description |
|---|---|---|
functionPath (optional) |
|
The path to the Python file, which should live in the python_functions subdirectory. |
module (optional) |
|
The module to import the Python function from. |
tokens (optional) |
|
A dictionary of tokens that will be passed to the Python function. This allows you to pass in substitution variables from the data flow spec. |
Important
You must select one of functionPath or module.