1. Get Started
Welcome to Starter Journey — your roadmap to master Databricks.
All about data
Every data + AI project — dashboards, ML models, GenAI agents — depends on the same thing: data that is accessible, governed, and reliable. Get the data foundation wrong and everything built on top of it breaks. Dashboards mislead. Models underperform. Agents hallucinate.
The three major workload families on Databricks make this concrete:
- BI & Analytics — requires reliable data. Accurate, complete, and served through a governed pipeline so reports reflect reality.
- Predictive AI & ML — requires high-quality data. Deduplicated, correctly typed, and versioned so models train on facts and experiments are reproducible.
- GenAI & Agents — requires curated data. Structured, up to date, and indexed so LLMs stay grounded instead of filling gaps with fiction.
All three run on the same platform, governed by the same catalog, and read from the same tables. The foundation is identical — what changes is the workload on top.
The Starter Journey is a reference guide that connects the dots between Databricks documentation, cloud setup, and governance decisions. It gives you the shortest path to a working MVP: the minimum set of steps, in the right order, with links to the official docs where details live. No filler, no tangents — just what you need to go from an empty account to a production-ready environment.
What you'll build
The Starter Journey is a step-by-step guide that takes you from an empty Databricks account to a production-ready environment. Each section builds on the previous one:
| # | Section | What you'll have when done |
|---|---|---|
| 2 | Before you start | A clear understanding of workspaces, Unity Catalog, and your cloud tenant model |
| 3 | Infra setup | Workspaces provisioned, users and groups added, SSO activated |
| 4 | Cost monitoring | Day-zero visibility: imported usage dashboard, optional packaged dashboards, tags, and account budgets |
| 5 | Data governance strategy | A catalog/schema structure that fits your organization's size |
| 6 | Access your data | Cloud storage connected, external systems accessible via managed connectors |
| 7 | Build the first pipeline | A working medallion pipeline (bronze → silver → gold) |
| 8 | Automation & orchestration | The pipeline running on a schedule with retries and notifications |
| 9 | Query and explore | Interactive SQL queries running against your lakehouse data |
| 10 | Databricks AI/BI | Dashboards, Genie Spaces, and apps surfacing data to business users |
| 11 | Business semantics | Self-paced path to activate governed metric views and Genie-ready metadata, with a hands-on checkpoint across SQL, AI/BI, and Genie |
Use the sidebar to navigate through each section in order, or jump directly to the topic you need.