Skip to main content

1. Get Started

Starter Journey is your roadmap for setting up Databricks the right way.

All about data

Get the data foundation right first. Every dashboard, ML model, and GenAI agent reads from the same tables, so when those tables are wrong, everything downstream is wrong too: reports mislead, models train on noise, and agents fill the gaps with fiction.

The three workload families on Databricks each lean on the foundation in a slightly different way:

  • BI & Analytics needs reliable data: accurate, complete, and served through a governed pipeline so reports match reality.
  • Predictive AI & ML needs high-quality data: deduplicated, correctly typed, and versioned so models train on facts and experiments stay reproducible.
  • GenAI & Agents needs curated data: structured, current, and indexed so the LLM stays grounded.

All three run on the same platform, answer to the same catalog, and read from the same tables. The foundation is identical. Only the workload on top changes.

So the Starter Journey is a reference guide. It sits between the Databricks docs, your cloud setup, and the governance calls you have to make, and it gives you the shortest path to a working MVP: the minimum steps, in the right order, with links to the official docs where the details live. The goal is to get you from an empty account to a production-ready environment without the detours.

What you'll build

Each section starts where the last one left off, taking you from an empty Databricks account to a production-ready environment:

#SectionWhat you'll have when done
2Before you startA clear understanding of workspaces, Unity Catalog, and your cloud tenant model
3Infra setupWorkspaces provisioned, users and groups added, SSO activated
4Cost monitoringDay-zero visibility: imported usage dashboard, optional packaged dashboards, tags, and account budgets
5Data governance strategyA catalog/schema structure that fits your organization's size
6Access your dataCloud storage connected, external systems accessible via managed connectors
7Build the first pipelineA working medallion pipeline (bronze → silver → gold)
8Automation & orchestrationThe pipeline running on a schedule with retries and notifications
9Query and exploreInteractive SQL queries running against your lakehouse data
10Databricks AI/BIDashboards, Genie Spaces, and apps surfacing data to business users
11Business semanticsSelf-paced path to activate governed metric views and Genie-ready metadata, with a hands-on checkpoint across SQL, AI/BI, and Genie

Work through the sidebar in order, or jump straight to the topic you need.