6. Build the first pipeline

info

Journey checklist

For data ingestion

For data transformation

For orchestration

Apache Spark 4.1.1 Spark Declarative Pipelines (SDP) documentation.
Python Pyspark and SQL support.
Process multiple sources simultaneously whether streaming from Kafka, batch loading from cloud storage, or querying external databases.
Built-in incremental processing intelligently tracks changes and processes only new or modified data, dramatically reducing compute costs and pipeline runtimes.
Data quality is enforced through declarative expectations that you define inline with your transformations.
Integrated with Unity Catalog (everything on Databricks is Unity Catalog).

Load data in pipelines.
- AutoLoader for incremental ingestions for data sitting in Cloud Object Storage.
Transform data with pipelines .
- When to use views, materialized views, and streaming tables!
Manage data quality with pipeline expectations .
- Data quality constraint and business rules defined as expectations.
Python
- Python language reference .
- Develop SDP with Python .
SQL
- SQL language reference.
- Develop SDP with SQL.

UI + Databricks Agent – Build a pipeline using the Databricks UI and Databricks Agent.
DABs – Build a pipeline using Databricks Asset Bundles.
MCP skills – Build a pipeline using MCP skills.