Skip to main content

How LakeTS Works

A deep dive into how LakeTS turns Databricks Lakebase into a full-featured time series database — explained simply.

Start here for the high-level picture, then follow the topic pages for each subsystem:

  • ChronoTables — how time-partitioned tables work under the hood
  • Time series functionstime_bucket, first, locf, interpolate, delta, rate, gapfill
  • RollUps — incremental aggregates, DAG cascade, scale optimizations (chunk-skip pruning, batch refresh, Unity Catalog export)
  • Tiering & retention — lifecycle policies, the CDF durability gate, and chunk drops
  • Lakebase CDF internals — shadow tables, triggers, CDC routing

The problem

Imagine you're collecting temperature readings from 1,000 sensors every second. That's 86 million rows per day. After a year, you have 31 billion rows.

Plain PostgreSQL struggles with this because:

  • Inserts slow down as the table grows (indexes get huge)
  • Queries scan everything even when you only want the last hour
  • No easy way to archive old data without manual partition management
  • Common patterns like "average temperature per hour with gap-filling" require complex SQL

LakeTS solves this by adding a time series layer on top of Postgres — purpose-built for Databricks Lakebase, with the added benefit of tiering cold data to a Unity Catalog Managed Table.

The big picture

LakeTS has two paths for your data:

Recent data is read straight from Lakebase over Postgres. As data ages, the tiering job validates that it is durable in the Unity Catalog Managed Table and flags the chunk tiered — at this point the partition is still resident in Lakebase. Later, the retention job drops the partition once it ages past drop_after. The cold copy is queried in Databricks (Databricks SQL, Spark, dashboards) as an ordinary Delta table. The two tiers are queried independently in their own engines; LakeTS does not federate the cold table back into Lakebase.

PathWhereSpeedCostWhat lives hereQueried with
HotLakebase (Postgres)< 10msHigherRecent data (days to weeks)Postgres / SQL clients
ColdUnity Catalog Managed Table100ms–1sLowerHistorical data (weeks to years)Databricks SQL / Spark
Key insight

Most time series queries care about recent data. By keeping only recent data in the fast Postgres layer and archiving the rest to a Unity Catalog Managed Table, you get the best of both worlds.

Summary — Postgres primitives used

LakeTS is built from these Postgres primitives — no custom extensions:

LakeTS FeatureBuilt With
ChronoTablesPARTITION BY RANGE + metadata tables
Time series functionsPL/pgSQL functions + CREATE AGGREGATE
Gap-fillinggenerate_series() + LEFT JOIN
RollUpsRegular TABLE + incremental refresh + UNION ALL view
RollUp optimizationChunk-skip pruning, batch ANY(array), Kahn's toposort, transition tables
Tiering_policy_registry + Databricks Jobs + CDF durability gate
RetentionDROP TABLE on expired partitions
Lakebase CDFShadow table + trigger + wal2delta CDC
MonitoringSQL functions over pg_stat_* + metadata

No magic. No custom extensions. Just smart composition of standard Postgres features + Unity Catalog integration.