Skip to main content

Quickstart

Install LakeTS on a Lakebase instance, create a ChronoTable, run your first time-series query. The whole flow takes about five minutes.

Prerequisites

  • A Databricks workspace with Lakebase enabled
  • A Lakebase autoscale instance running PostgreSQL 17 or later (LakeTS is not supported on provisioned instances or earlier Postgres versions)
  • A PostgreSQL client (psql, DBeaver, or any Postgres-compatible tool)

1. Install LakeTS

Pick the install path that matches how you got the LakeTS code. All three paths produce the same end state.

Download the signed single-file release, verify it, then run it against your Lakebase instance:

curl -LO https://github.com/databricks-solutions/lakets/releases/latest/download/lakets.sql
curl -LO https://github.com/databricks-solutions/lakets/releases/latest/download/lakets.sql.sha256
sha256sum -c lakets.sql.sha256

psql -q -h <your-lakebase-host> -U <user>@databricks.com -d <database> -f lakets.sql

Verify the install:

SELECT version, installed_at, modules FROM lakets._version
ORDER BY installed_at DESC LIMIT 1;

2. Create your first ChronoTable

A ChronoTable is a time-partitioned table — LakeTS's core abstraction for time series data.

CREATE TABLE metrics (
time TIMESTAMPTZ NOT NULL,
device TEXT NOT NULL,
cpu DOUBLE PRECISION,
memory DOUBLE PRECISION
);

INSERT INTO metrics
SELECT now() - (i || ' minutes')::interval, 'sensor_' || (i % 10),
50 + 30 * sin(i::float / 100), 40 + 20 * cos(i::float / 200)
FROM generate_series(1, 10000) AS s(i);

-- Convert to a ChronoTable with 1-day partitions
SELECT lakets.create_chronotable('metrics', 'time', '1 day');

Check the partitions:

SELECT * FROM lakets.show_chunks('metrics');

3. Run your first time-series query

Aggregate by arbitrary intervals with time_bucket:

SELECT
lakets.time_bucket('1 hour'::interval, time) AS hour,
device,
avg(cpu) AS avg_cpu,
max(memory) AS max_mem
FROM metrics
WHERE time > now() - interval '1 day'
GROUP BY 1, 2
ORDER BY 1 DESC;

Get the first and last reading per device with lakets.first / lakets.last:

SELECT
device,
lakets.first(cpu, time) AS first_reading,
lakets.last(cpu, time) AS latest_reading
FROM metrics
GROUP BY device;

That's the core LakeTS loop: a single table call turned a regular Postgres table into a time-partitioned ChronoTable, and standard SQL queries now use time-aware functions automatically.

What's next

A typical LakeTS deployment progresses through three phases. Work through them in order; each step uses the work from the previous step.

Phase 1 — model your data

You already created your first ChronoTable above. Before touching anything else, finalize the table shape:

  • If your writes carry tags + fields (host, region, env, cpu, memory), use create_metric_table for a multi-metric ChronoTable
  • Confirm tag cardinality is sane — see Manage tag cardinality before you scale ingest
  • Pick a chunk interval that matches your retention shape (most workloads land on 1 day)

Phase 2 — decide where data lives over time

This is the most important step and the easiest one to skip. Choose retention and tiering policies before you accumulate data:

Phase 3 — build your read patterns

Now you can think about how the application or dashboard reads the data:

  • Dashboards over recent data — define RollUps so dashboards don't re-scan raw rows
  • "What's the current value?" widgets — turn on the Last Value Cache for sub-10ms reads
  • Long-window queries — query the cold tier directly in Databricks (Spark / Databricks SQL)
  • Real-time alerts — add threshold and deadman alerts
  • High-throughput writes — use ingest_batch instead of per-row inserts
  • Cross-team analyticssync RollUps to Unity Catalog via Lakebase CDF so Spark, BI, and ML can read them
  • Observability of LakeTS itself — see Monitoring

Reference + further reading

  • How It Works — the internals of ChronoTables, RollUps, and Lakebase CDF
  • Reference — full catalog of 69 functions, 2 aggregates, 6 triggers, 8 metadata tables — organized by topic
  • Troubleshooting — when something doesn't behave the way you expect
  • Databricks workflows — automated partition management, tiering, retention, and aggregate refresh