Skip to main content

13. CI/CD and DevOps

You'll learn how to split DevOps responsibilities between Terraform and Declarative Automation Bundles (DABs), and how to run CI/CD for Databricks projects in ~15 min.

Prereqs: 7. Build the first pipeline: DABs, 8. Automation & orchestration: DABs, 3. Infra setup

Why this matters

Draw one line and most of the confusion goes away: anything outside the workspace is Terraform's job, anything inside it is the DABs job. Get that line wrong and you end up with brittle scripts, environments that drift apart, and a deploy only one person knows how to run.

Clicking changes into the Workspace by hand works until it doesn't. The moment a second person needs to ship, or you need staging to match prod, you want repeatable deploys, a review gate, and a way to promote a change from dev to prod. That is the same discipline that keeps application code sane, applied to data work.

Journey checklist

  • Get started.
  • Before you start.
  • Infra setup.
  • Cost monitoring.
  • Data governance strategy.
  • Access your data.
  • Build the first pipeline.
  • Automation and orchestration.
  • Query and explore.
  • Databricks AI/BI.
  • Business semantics.
  • 12. Data Access Control (in progress)
  • 13. CI/CD and DevOps

The two layers

Databricks DevOps is two separate concerns, each with its own tool and lifecycle.

LayerWhat it coversToolExample
Platform infrastructure (external, AWS and Databricks account related)Accounts, networks, metastores, workspaces, and IAM: anything that lives outside a Databricks workspaceTerraformTerraform Examples
Databricks Projects (internal, within the Workspace)Jobs, pipelines, schemas, dashboards, and other assets a Databricks project needs.Declarative Automation Bundles (DABs)

What is DABs?

Declarative Automation Bundles (DABs) is infrastructure-as-code for Databricks. The simplest way to think about it: Databricks as code. Project assets live in Git as YAML and source files instead of as clicks someone made in the Workspace and hoped to remember.

  • IaC for the workspace. DABs defines jobs, pipelines, schemas, and related resources in config files. The bundle is what you deploy: one repo, one project, several environment targets.
  • A software-engineering workflow. A Databricks project lives the same way application code does, with branches, pull requests, and CI/CD. Your team picks the branching model. Default to trunk-based development, since it keeps main deployable and spares you long-lived branches that fight to merge.
  • CLI-native. DABs ships with the Databricks CLI. Install the CLI on a CI/CD runner such as a GitHub Actions worker in one step, then run databricks bundle deploy from the pipeline.
  • CI/CD in practice. A typical pipeline validates the bundle, runs tests, and deploys to a target workspace. See Create a GitHub Actions workflow for CI/CD for a full example.

DABs in the Workspace

DABs in VSCode

Start at Minute 20:50

GitHub code examples

RepoWhat's inside
bundle-examples / knowledge_baseOfficial Databricks reference library. Covers Genie Agents, Metric Views, Apps, Lakebase, Jobs, Pipelines, Models, Model Serving Endpoints, and Vector Search indexes. Good first stop for any bundle pattern.
databricks-dab-examples / flightsEnd-to-end worked example built around a flights dataset. Comes in three tiers (simple, advanced, bundle template) so you can follow the progression from a minimal bundle to a production-ready project.
databricks-dab-examples / knowledge-baseSolutions-team reference examples. Includes an Azure DevOps CI/CD pipeline, a React + Lakebase app, metric views, a uv-managed bundle, and a DAIS 2024 modular orchestration template.

How to migrate existing Workspace assets to DABs?

The following items are covered in the video:

  • Create the DABs project and base file structure.
  • Migrate workspace assets to DABs.
  • Create a base CI/CD pipeline for your preferred DevOps tool.
warning

The Genie Code skill presented here is not a official Databricks-supported tool. Validate generated bundles in a non-production workspace before you rely on them in CI/CD.

Monolithic-repo or multiple repos?

Use one Git repo per Databricks project: one bundle, one deployment boundary, one owning team.

A single repo that holds every Databricks project in the org looks tidy. It isn't. The cost shows up as soon as a second team starts committing to it:

  • Merge conflicts pile up when unrelated teams touch shared folders, CI configs, or bundle targets.
  • CI gets slow and noisy. A change to one team's pipeline kicks off validation for every project in the repo.
  • Ownership blurs. When a deploy fails, there is no clear owner, and a rollback drags in assets another team never touched.
  • Release cadences collide. Team A can't ship a hotfix while Team B is sitting on a long-running feature branch that's holding main.

Example

At Awesome123 corp, two teams kick off separate Databricks projects at the same time. Each gets its own repo, bundle, and CI/CD pipeline.

TeamProjectRepoAssets
Data EngineeringMarketing C360databricks-marketing-c360Jobs, pipelines, schemas, SQL warehouses, dashboards
Data ScienceFinance revenue predictiondatabricks-finance-revenue-predictionTraining jobs, registered models, serving endpoints, dashboards

A pipeline change in marketing C360 does not trigger CI for the finance ML project. Each team ships on its own schedule.

When to use / when not to

SituationUse
Provision workspaces, networks, or cloud IAMTerraform
Platform settings must match across accounts or regionsTerraform
Deploy Databricks projects (jobs, pipelines, notebooks, schemas)DABs
Changes need review before reaching productionDABs
One-off notebook or prototype, single owner, nothing downstream depends on itNeither

Next