Workflow jobs
These scheduled Databricks Jobs drive the operational lifecycle of LakeTS. The bundle at databricks/bundles/databricks.yml deploys all of them at once.
All jobs run on serverless compute — there is no cluster to provision. Each runs its databricks/workflows/*.py file as a spark_python_task, and the Python dependencies (psycopg[binary], databricks-sdk) are declared per job in the bundle's environments block.
| Job | Schedule | What it does |
|---|---|---|
| Partition Manager | Every 6 h | Calls _ensure_partitions() — pre-creates future partitions |
| Tiering | Daily 2 AM | _get_chunks_to_tier() → tier_chunk() per candidate — validates cold chunks are durable in UC and flags them tiered (does not drop; pure Lakebase SQL, no Spark) |
| Retention | Daily 3 AM | execute_retention() — drops expired Lakebase partitions (gated on UC durability, fail-closed); never deletes from the UC Managed Table |
| RollUp Refresh | Every 15 min | refresh_rollup_cascade() — refreshes RollUps in dependency (DAG) order, children before parents |
Each job is idempotent and stateless — re-running it cannot corrupt data. Lakebase remains the source of truth for state (registries, watermarks, invalidation log); the jobs read that state and execute against it.
RollUps refresh only from Lakebase-resident source data. Buckets whose source partition has been dropped are no longer re-aggregated — the RollUp keeps its last computed value. See How RollUps work for the mechanics.
Authentication & permissions
The jobs target a Lakebase Autoscaling project and connect with machine-to-machine (M2M) OAuth — there are no static passwords. Each job runs as a Databricks service principal, and the shared helper lakebase_utils.py resolves the project's primary read-write endpoint and mints a short-lived Postgres credential for that identity on every connection, following the psycopg3 connection pattern:
# Resolve projects/<name> -> default branch -> read-write endpoint, then:
endpoint = "projects/<project>/branches/production/endpoints/<id>"
host = w.postgres.get_endpoint(name=endpoint).status.hosts.host
cred = w.postgres.generate_database_credential(endpoint=endpoint)
# cred.token is used as the Postgres password; it is minted inside connect()
# so any reconnect transparently gets a fresh, non-expired token (~1 h lifetime).
The lakebase_project bundle variable names the project; set LAKETS_LAKEBASE_ENDPOINT to skip resolution and pin a specific endpoint path.
You need a service principal to run these jobs, and that service principal must have permission on the Lakebase project to perform the operations the jobs execute (creating/dropping partitions, refreshing RollUps, enforcing retention).
1. A service principal executes the jobs
The bundle's prod target runs every job as a service principal via run_as:
# databricks/bundles/databricks.yml
targets:
prod:
run_as:
service_principal_name: ${var.service_principal_name}
Deploy with the service principal's application ID and the target Lakebase project:
databricks bundle deploy -t prod \
--var="service_principal_name=<sp-application-id>" \
--var="lakebase_project=<project-name>"
The service principal must have indefinitely-lived OAuth (M2M) credentials; follow Authorize service principal access to Databricks with OAuth and Obtain an OAuth token in a machine-to-machine flow. Inside a Databricks job the SDK resolves this identity automatically; to run a job file outside Databricks, supply the same credentials via the standard environment variables:
export DATABRICKS_HOST="https://<workspace-url>/"
export DATABRICKS_CLIENT_ID="<sp-application-id>"
export DATABRICKS_CLIENT_SECRET="<sp-oauth-secret>"
(Requires databricks-sdk >= 0.81.0 for the Autoscaling w.postgres API.)
2. The service principal needs a Lakebase Postgres role
The OAuth token authenticates as a Postgres role named after the service principal. That role must exist on the Lakebase project's branch and be granted the privileges the jobs use. Connect as a Lakebase admin and grant, for example:
-- Register the service principal as a Postgres role (Databricks-managed identity)
-- and grant the privileges the maintenance jobs need.
GRANT USAGE, CREATE ON SCHEMA public, lakets TO "<sp-application-id>";
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public, lakets TO "<sp-application-id>";
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA lakets TO "<sp-application-id>";
-- Cover objects created later (new partitions, new RollUp tables):
ALTER DEFAULT PRIVILEGES IN SCHEMA public, lakets
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO "<sp-application-id>";
CREATE on public/lakets lets the Partition Manager add partitions and lets Retention drop them (Tiering only flags chunks); EXECUTE on lakets functions covers tier_chunk(), execute_retention(), _ensure_partitions(), refresh_rollup_cascade(), and friends. If your install scripts ran as a different owner, the simplest alternative is to make the service principal the owner of the LakeTS objects (or a member of the owning role).
By default the helper uses the running identity (current_user) as the Postgres role. If your Lakebase role name differs from the service principal's application ID, override it with the LAKETS_PG_ROLE environment variable on the job.