Skip to main content

Recap and learning

You'll verify your understanding of the three core Databricks concepts in ~3 min.

Prereqs: Account Console, Workspace, Unity Catalog

The call

Infrastructure setup assumes you already know these three concepts. A gap here doesn't show up now. It shows up later as a misconfigured workspace, a broken permission, or an ungoverned data path. Answer the questions below before you move on, and if any answer feels shaky, go back a page.

Account Console

What is it? A central admin portal, similar to the AWS console or Azure portal, where you manage workspaces, Unity Catalog metastores, users, groups, service principals, billing, SCIM, and SSO.

When do you use it? When you provision or delete workspaces, set up identity, assign metastores, or review billing. Day-to-day data work happens inside a workspace, not here.

Workspace

What is it? A cloud-based environment scoped to a region where teams run notebooks, jobs, SQL queries, dashboards, and ML experiments.

How many do I need? Start with three: development, staging, and production. That follows the standard software development lifecycle. You do not need one workspace per team. Use groups and permissions instead.

How do I isolate data between workspaces? You don't, at least not at the workspace level. Data isolation comes from Unity Catalog grants. A workspace can reach any data its attached metastore governs, subject to the permissions you define.

Unity Catalog

What is it? The centralized governance layer that manages permissions, lineage, and discovery for all data and AI assets across workspaces.

Where is the data stored? In your cloud account's object storage. Unity Catalog governs access to your data. It does not store or move it.

How should every data and AI interaction be done? Through Unity Catalog. Route every read, write, and execution through UC so grants, lineage, and audit logs are enforced.

What should be avoided?

danger
  • Accessing data with hardcoded credentials in notebooks or scripts.
  • Configuring data access at the cluster level using external libraries and environment variables.

Both approaches bypass UC governance. Access becomes invisible to lineage and impossible to audit or revoke centrally.

Next