Skip to main content

6. Build the first pipeline

info
  • Create the first spark declarative pipeline on Databricks.

Journey checklist

  • Identify target cloud tenant(s).
  • Infra setup.
  • Data Governance Strategy.
  • Access your data.
  • Build the first pipeline.
  • Automation and orchestration.
  • Query and explore.
  • Databricks AI/BI

Data Engineering on Databricks

Lakeflow introduction

Lakeflow in action

Lessons learned from the previous videos

For data transformation
For orchestration

Spark Declarative Pipelines

  • Apache Spark 4.1.1 Spark Declarative Pipelines (SDP) documentation.
  • Python Pyspark and SQL support.
  • Process multiple sources simultaneously whether streaming from Kafka, batch loading from cloud storage, or querying external databases.
  • Built-in incremental processing intelligently tracks changes and processes only new or modified data, dramatically reducing compute costs and pipeline runtimes.
  • Data quality is enforced through declarative expectations that you define inline with your transformations.
  • Integrated with Unity Catalog (everything on Databricks is Unity Catalog).

SDP Features

Technical references before coding 🛡️

Create the first pipeline 🛠️

  • UI + Databricks Agent – Build a pipeline using the Databricks UI and Databricks Agent.
  • DABs – Build a pipeline using Databricks Asset Bundles.
  • MCP skills – Build a pipeline using MCP skills.