1. Get Started

Welcome to Starter Journey — your roadmap to master Databricks.

All about data

Every data + AI project — dashboards, ML models, GenAI agents — depends on the same thing: data that is accessible, governed, and reliable. Get the data foundation wrong and everything built on top of it breaks. Dashboards mislead. Models underperform. Agents hallucinate.

The three major workload families on Databricks make this concrete:

BI & Analytics — requires reliable data. Accurate, complete, and served through a governed pipeline so reports reflect reality.
Predictive AI & ML — requires high-quality data. Deduplicated, correctly typed, and versioned so models train on facts and experiments are reproducible.
GenAI & Agents — requires curated data. Structured, up to date, and indexed so LLMs stay grounded instead of filling gaps with fiction.

All three run on the same platform, governed by the same catalog, and read from the same tables. The foundation is identical — what changes is the workload on top.

The Starter Journey is a reference guide that connects the dots between Databricks documentation, cloud setup, and governance decisions. It gives you the shortest path to a working MVP: the minimum set of steps, in the right order, with links to the official docs where details live. No filler, no tangents — just what you need to go from an empty account to a production-ready environment.

What you'll build

The Starter Journey is a step-by-step guide that takes you from an empty Databricks account to a production-ready environment. Each section builds on the previous one:

#	Section	What you'll have when done
2	Before you start	A clear understanding of workspaces, Unity Catalog, and your cloud tenant model
3	Infra setup	Workspaces provisioned, users and groups added, SSO activated
4	Cost monitoring	Day-zero visibility: imported usage dashboard, optional packaged dashboards, tags, and account budgets
5	Data governance strategy	A catalog/schema structure that fits your organization's size
6	Access your data	Cloud storage connected, external systems accessible via managed connectors
7	Build the first pipeline	A working medallion pipeline (bronze → silver → gold)
8	Automation & orchestration	The pipeline running on a schedule with retries and notifications
9	Query and explore	Interactive SQL queries running against your lakehouse data
10	Databricks AI/BI	Dashboards, Genie Spaces, and apps surfacing data to business users
11	Business semantics	Self-paced path to activate governed metric views and Genie-ready metadata, with a hands-on checkpoint across SQL, AI/BI, and Genie

Use the sidebar to navigate through each section in order, or jump directly to the topic you need.

All about data​

What you'll build​

All about data

What you'll build