Scale-to-zero & lifecycle · Twill DB docs

The idea

Scale-to-zero rests on one invariant: compute is stateless — all durable state lives in object storage behind the Storage seam. A warm instance is just a cache plus a CPU; destroying it loses nothing durable, so the controller is free to stop the engine whenever a database is idle and reconstruct an equivalent instance on the next connection.

compute cost at rest

bytes

only object storage bills idle

start + warm

cold-start cost

writer lease per DB

Scale-to-zero needs an s3:// backend

Idling compute to nothing requires storage disaggregation: durability must bottom out on object storage so that stopping the engine loses nothing. A pure-embedded file:// app has no disaggregation — its durable state is a local file the process owns, so there is nothing to scale to. Use s3://, r2://, or gs:// for scale-to-zero.

The lifecycle controller

The twill-controller crate is a thin, stateless supervisor. It composes the engine's existing primitives rather than reimplementing them: opening a Database acquires the writer fence and replays the WAL — that is the cache warm — and dropping it releases the fence. On top of that the controller adds the state machine, an idle-timeout reaper, a lease heartbeat, and thundering-herd handling.

The controller owns no durable state

Every durable byte lives in object storage behind the storage seam. The controller holds only in-memory lifecycle bookkeeping; it is restartable and replaceable, and reconstructs from storage. It is never on the data path — it supervises instances, it does not proxy SQL.

  Cold ──start──▶ Warming ──opened──▶ Active ──no work──▶ Idle ──timeout──▶ Stopping
   ▲                 │  (open fails)       ▲     │                              │
   │                 ▼                     └─────┘  (new connection re-activates)│
   └─────────────────┴───────────────────────────────────────────────────────────┘
                                  (Stopping always lands back in Cold)

Cold → Warming → Active → Idle → Stopping → Cold. A new connection pulls an Idle instance back to Active; a failed warm returns cleanly to Cold; Stopping releases the fence and lands back in Cold.

State	Meaning	Leaves to
Cold	No process. Only object-storage bytes bill at rest.	Warming (on first connection)
Warming	Cold-starting: handle open + fence acquire + WAL replay (cache warm).	Active (opened) / Cold (open fails)
Active	Serving connections; cache warm; writer lease held.	Idle (no active leases)
Idle	Warm but with zero active connections; lease still heartbeat.	Active (new connection) / Stopping (idle timeout)
Stopping	Tearing down: drop the handle so the engine releases the fence.	Cold

Cold start

A cold start is exactly: process start + cache warm + fence acquire + WAL replay. In the controller these collapse into one step — Database::open(url) acquires the single-writer fence and replays the durable WAL, which is what warms the instance. The dominant tail term is cache warm: the first reads after a cold start miss the local cache and fall through to object storage, so a large random working set is the worst case.

Idle reaper

A background reaper runs every reap_interval. For each warm instance with no active leases it moves Active → Idle, and once an instance has sat Idle past idle_timeout (and keep_warm is off) it tears the handle down — Stopping → Cold — releasing the fence. The lifecycle and heartbeat threads live in the controller, deliberately not in the embedded engine core, which stays thread-free so embedders own their own scheduling.

Single-writer lease heartbeat

The writer lease is durable and fenced by a monotonic CAS epoch. The reaper heartbeats it for every warm instance (Active or Idle) by calling Database::renew_lease(); if a renewal fails — the instance has been fenced by a newer writer — the controller treats it as fatal, drops the handle, and returns the instance to Cold. Split-brain is impossible by construction: only one CAS epoch wins each append, so a stalled writer's appends are simply rejected.

Keep-warm & thundering-herd admission

Two mechanisms bound the cold-start tail under load:

MUST dedupe: N concurrent start calls for one cold database trigger exactly one Warming transition; the rest wait on that single warm rather than each spawning a process.
SHOULD admit under a cap: a bounded warm-admission semaphore (max_concurrent_warms) limits how many distinct databases warm at once, so a herd of many cold databases cannot saturate CPU or the object store's request budget.
MAY keep warm: with keep_warm on, idle instances stay resident past idle_timeout to cut post-idle latency for latency-critical, low-traffic databases.

The cost trade-off

Scale-to-zero trades a cold-start tail for zero idle compute. The single knob with the most leverage is idle_timeout:

Workload shape	idle_timeout	keep_warm
Bursty, recurring (every few minutes)	longer	off
Latency-critical, low traffic	moderate	on
Truly rare / archival	short (default 30 s)	off
Predictable spike (deploy / cron)	default	pre-warm ahead

Too short an idle_timeout pays the cold-start tax repeatedly on bursty-but-recurring traffic; too long wastes compute and defeats scale-to-zero. Tune it against each database's inter-arrival distribution, and reach for keep_warm only where the cold-start tail actually hurts.

Stopping is never a stopped commit

Scale-to-zero never compromises durability. The controller stops an idle engine, never a commit: an instance only tears down once it has no active leases, and the engine's append_wal is durable before any commit is acked. Dropping the warm handle releases the fence cleanly for the next writer.

CONNConnect to your databasePick the s3:// backend that makes scale-to-zero possible. BRBranchingMany branches can exist with zero running instances — branches are orthogonal to lifecycle. QSQuickstartGo disaggregated with a one-line connection-string change. CTLLifecycle & Controller (spec)The full state machine, fencing, and thundering-herd model in the design spec.