Lakebase CDF Setup for LakeTS
Lakebase CDF streams data from Lakebase to a Unity Catalog Managed Table via CDC (Change Data Capture). It gives you:
- Cold-storage tiering to a Unity Catalog Managed Table
- Query the cold tier directly in Databricks (Spark / Databricks SQL)
- Analytics on historical data using Spark
Prerequisites
- A Databricks workspace with Lakebase enabled
- A Lakebase autoscale instance running PostgreSQL 17 or later
- LakeTS installed on that instance (Quickstart)
- At least one ChronoTable to sync
How It Works
ChronoTable (partitioned) --trigger--> Shadow Table (unpartitioned)
|
Lakebase CDF
|
UC Managed Table (append log)
|
Current-State View (deduplicated)
Lakebase CDF does not support partitioned tables directly. LakeTS uses a shadow table pattern: an unpartitioned table with REPLICA IDENTITY FULL receives forwarded writes via a trigger, and Lakebase CDF syncs the shadow table to the Unity Catalog Managed Table.
Step 1: Enable CDF on a ChronoTable
-- The ChronoTable must already exist
SELECT lakets.enable_sync('metrics');
This creates:
_shadow_metrics— unpartitioned table with same schema- A trigger that forwards all INSERT/UPDATE/DELETE to the shadow table
- Sets
REPLICA IDENTITY FULLon the shadow table
Verify:
SELECT sync_enabled, shadow_table_name
FROM lakets._chronotable_registry WHERE table_name = 'metrics';
-- sync_enabled=true, shadow_table_name=_shadow_metrics
Step 2: Configure Lakebase CDF in Databricks
- Open your Lakebase instance in the Databricks UI
- Navigate to Branch overview > CDF tab
- Click Start sync
- Select the schema containing your shadow tables (e.g.,
public) - Configure the destination Unity Catalog catalog and schema
- Start the sync
The shadow table _shadow_metrics will appear as lb__shadow_metrics_history in your Unity Catalog.
Step 3: Monitor Sync Status
From Lakebase:
SELECT table_oid::regclass AS shadow, status, committed_lsn
FROM wal2delta.tables;
-- status: STREAMING or SNAPSHOTTING
-- committed_lsn: latest synced position
Step 4: Query Data from the Unity Catalog Managed Table
The UC Managed Table is an append-only CDC log. Each row has:
_change_type: insert, delete, update_preimage, update_postimage_timestamp: when the change was captured_lsn: Log Sequence Number_xid: Transaction ID
To get the current state:
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY device ORDER BY _lsn DESC) AS rn
FROM catalog.schema.lb__shadow_metrics_history
WHERE _change_type IN ('insert', 'update_postimage', 'delete')
) WHERE rn = 1 AND _change_type != 'delete';
Step 5: Querying across tiers
The two tiers are queried independently, each in its own engine — LakeTS does not federate the Unity Catalog table back into Postgres.
Recent (hot) data lives in Lakebase and is queried over Postgres:
-- Hot data from Lakebase (Postgres)
SELECT time, device, cpu FROM metrics WHERE time > now() - interval '7 days'
ORDER BY time DESC;
Historical (cold) data lives in the Unity Catalog Managed Table and is queried in Databricks (Spark / Databricks SQL):
-- Cold data from the UC Managed Table, queried in Databricks
SELECT time, device, cpu FROM catalog.schema.metrics_archive WHERE time <= now() - interval '7 days'
ORDER BY time DESC;
Disabling CDF
SELECT lakets.disable_sync('metrics');
This drops the shadow table, removes the trigger, and updates the registry. It does not delete data already synced to the UC Managed Table.
Limitations
| Limitation | Workaround |
|---|---|
| Partitioned tables can't be synced | Shadow table pattern (handled by LakeTS) |
| Schema changes break sync | Create new shadow table, re-enable CDF, backfill |
| Unidirectional (Lakebase → UC only) | Re-ingest from the UC Managed Table via Databricks Job for re-heating |
| PostGIS, pgvector, composite types not supported | Exclude from shadow table or cast to TEXT |
| Empty tables not synced | First INSERT triggers sync start |
Supported Data Types
| Supported | Not Supported |
|---|---|
| BOOLEAN, INT, BIGINT, FLOAT, DOUBLE | PostGIS geometry/geography |
| TEXT, VARCHAR, CHAR | pgvector embeddings |
| DATE, TIMESTAMP, TIMESTAMPTZ | Composite types |
| JSONB (as STRING), ENUM (as STRING) | hstore |
| NUMERIC/DECIMAL |