Skip to main content

Tags and attribution

You'll tag classic compute and configure serverless usage policies so system.billing.usage rolls up by team or project in ~25 min.

Prereqs: Infra setup

What you'll walk away with

Consistent custom_tags on your billable rows. Classic clusters, warehouses, and pools get tagged with custom tags. Serverless notebooks, jobs, Lakeflow pipelines, and serving endpoints get tagged through serverless usage policies (the docs also call these budget policies, and the billing column is usage_metadata.budget_policy_id).

Prerequisites

  • Workspace admin for tagging compute and creating serverless usage policies.
  • Account admin for workspace-level tags through the Account API and for tag-aware budgets on Budget alerts.
warning

Tags apply from creation forward. Historical rows stay untagged. Start early if you need chargeback.

Serverless compute

Public Preview: serverless usage policies

Serverless notebooks, jobs, Lakeflow pipelines, and model serving pick up tags from policies, not from cluster tags.

  1. Avatar → SettingsCompute.
  2. Next to Serverless usage policies, click Manage.
  3. Create → name the policy → add tag pairs (for example team:data-engineering).
  4. PermissionsGrant access → assign User or Manager roles.

If a user has one policy assigned, it auto-attaches. With more than one, they pick at creation time. If they pick nothing, the UI may default to the first policy alphabetically. Either way, only new usage gets the tags.

Classic compute

Custom tags apply to clusters, SQL warehouses, pools, and job compute (GA).

Cluster: Compute → cluster → EditAdvanced optionsTags → add keys and values → confirm or restart.

SQL warehouse: SQL Warehouses → warehouse → EditTags → save.

Pools and jobs: use the Pools UI or Jobs compute tags. Bundles allow up to 25 tags per job definition.

Workspace tags: account admins only. PATCH workspaces with custom_tags through the Account API.

The default tags (Vendor, ClusterId, ClusterName, Creator, RunName, and JobId on job compute) stay automatic.

Video: Tagging clusters for cost attribution. The walkthrough says Azure in the title, but the same tagging flow applies on AWS and GCP workspaces.

Compute policies can require tags at cluster creation (ComputePolicies). For the policy JSON and the limits, see Create and manage compute policies and the policy reference.

Limits and cloud rules

  • Allowed characters: letters, digits, and + - = . , _ : @. No spaces, no /.
  • Up to 20 custom tags per workspace-managed compute resource. Bundles extend jobs separately.
  • Do not use the reserved key Name for custom tags.
  • Cluster tag edits often need a restart to reach the cloud instances. Workspace tags can lag up to one hour.
  • Pool workloads propagate workspace and pool tags to the cloud VMs. Cluster-only tags still show up in Databricks billing.
  • A key that matches a default key may gain an x_ prefix in the cloud. A policy conflict can hard-fail cluster creation instead.
tip

GCP labels are more restrictive (length, lowercase). Expect truncation on email-like values.

Example queries

Cost by team tag:

SELECT
custom_tags['team'] AS team,
SUM(u.usage_quantity * lp.pricing.effective_list.default) AS estimated_cost_usd
FROM system.billing.usage u
JOIN system.billing.list_prices lp
ON u.sku_name = lp.sku_name
AND u.cloud = lp.cloud
AND u.usage_start_time >= lp.price_start_time
AND (u.usage_end_time <= lp.price_end_time OR lp.price_end_time IS NULL)
WHERE u.usage_date >= CURRENT_DATE - INTERVAL 30 DAY
GROUP BY 1
ORDER BY estimated_cost_usd DESC;

Untagged classic clusters (gap hunt):

SELECT
workspace_id,
sku_name,
usage_metadata.cluster_id,
SUM(usage_quantity) AS total_dbus
FROM system.billing.usage
WHERE usage_date >= CURRENT_DATE - INTERVAL 30 DAY
AND custom_tags['team'] IS NULL
AND usage_metadata.cluster_id IS NOT NULL
GROUP BY 1, 2, 3
ORDER BY total_dbus DESC;

Serverless usage by budget_policy_id:

SELECT
usage_metadata.budget_policy_id,
billing_origin_product,
SUM(u.usage_quantity * lp.pricing.effective_list.default) AS estimated_cost_usd
FROM system.billing.usage u
JOIN system.billing.list_prices lp
ON u.sku_name = lp.sku_name
AND u.cloud = lp.cloud
AND u.usage_start_time >= lp.price_start_time
AND (u.usage_end_time <= lp.price_end_time OR lp.price_end_time IS NULL)
WHERE u.usage_date >= CURRENT_DATE - INTERVAL 30 DAY
AND u.usage_metadata.budget_policy_id IS NOT NULL
GROUP BY 1, 2
ORDER BY estimated_cost_usd DESC;

More patterns: Top 10 queries to use with System Tables.

Verify

  1. Tag a cluster test_tag:verification, run some work, wait 2 to 4 hours, then filter system.billing.usage on that map key.
  2. Assign yourself a serverless usage policy, run serverless work, then confirm budget_policy_id is populated.
  3. In AWS Cost Explorer (or its equivalent), confirm the propagated tags when classic compute backs the bill.

Where people trip

Tags missing on serverless rows

Classic tags never apply to fully serverless runs. Use a serverless usage policy.

Cluster creation fails inside a policy

Rename the conflicting keys. For example, use x_vendor instead of colliding with the defaults.

Cloud billing lacks cluster tags on pooled workloads

Move the tags to the pool or the workspace, or rely on Databricks system.billing.usage for attribution.

Policy never attaches to an old notebook

Policies are not retroactive. Update the notebook compute selector (More…) to pick the policy.

Next