Roadmap & Build Sequence · Serverless OLTP Engine Spec

Purpose & scope

This page turns the architecture into a build schedule. It synthesizes the recommended rollout and the build-vs-buy shopping list into an ordered set of phases, each with concrete deliverables, the specs it draws on, the gate it must pass, and a Definition of Done. It is the document an engineering lead uses to sequence work and to decide what is in scope for the next milestone.

It does not redefine component internals — those live in the per-component specs (Engine Core, Storage Interface, Object-Storage Backend, and the rest). This page is the schedule and the critical path; the component pages are the blueprints.

What this plan governs

Five phases (embedded library → object storage → server mode → controller → optional capabilities), a build-vs-buy shopping list that says which slots to build and which to inherit, the inter-phase critical path, a per-phase Definition of Done, and a milestone table. Phases gate on the experiments defined in the Benchmark Plan.

Guiding sequencing principle

The phases are ordered so that the storage seam is a configuration value, not a rebuild. The engine speaks to a single narrow Storage trait; whether that trait is satisfied by a local file or by an LSM-on-S3 backend is decided by the connection string at engine_open time (file://… vs s3://…). Because of that seam, each phase only adds a backend, a listener, or a control loop — it never reworks what came before.

The one-line ordering rule

Ship the smallest embeddable thing first; then add disaggregation (a second Storage impl); then add server mode (a wire-protocol listener over the same library); then add the controller (a lifecycle loop around the same server). Each layer is additive because the seam below it does not move.

MUST keep the Storage trait stable from Phase 1 onward; adding the ObjectStorage backend in Phase 2 MUST NOT change the trait signature the engine calls.
MUST select the storage backend by connection-string scheme at open time, so the same compiled library embeds against a file or against S3 with no rebuild.
MUST treat server mode as the embedded library wrapped in a listener; the engine binary built in Phase 1 MUST be the same engine linked into engine-server in Phase 3.
SHOULD deliver each phase to a state where it is independently demonstrable and usable, not merely a checkpoint toward a later phase.
MUST NOT fold a later phase's concern (e.g. branching, idle-stop) into an earlier phase as a shortcut; doing so couples layers the seam is meant to keep separate.

Each phase stacks on the one before; the storage seam at the bottom never moves, which is what makes the sequence additive.

Phase 1 — Embedded library first

embedded zero infra Ship the engine as an in-process library bound into Bun via bun:ffi, backed by LocalFileStorage (a plain .db file, no network). This is the fastest route to a working demo: a Bun process opens the library, runs SQL, and persists to a local file with function-call latency. No object store, no server, no controller.

Deliverables

MUST produce libengine.a / libengine.so / libengine.dylib — the embeddable static and dynamic libraries.
MUST publish engine.h, the C-ABI header that is the stable boundary every runtime binds to (engine_open, engine_query, engine_close).
MUST ship the @twilldb/bun thin TypeScript wrapper over the raw bun:ffi symbols for ergonomic embedded use.
MUST implement LocalFileStorage as the first concrete Storage impl, satisfying the trait the engine already calls.
SHOULD include the local cache (shared-buffer page cache) so the hot path is in-process even before disaggregation arrives.

Interface frozen in this phase

// engine.h — the stable C ABI every runtime binds to. Frozen in Phase 1.
typedef void* engine_handle;

// url selects the storage backend by scheme: "file://..." (P1) or "s3://..." (P2).
engine_handle engine_open(const char* url);
const char*   engine_query(engine_handle h, const char* sql);   // returns serialized rows
void          engine_close(engine_handle h);

import { dlopen, FFIType, suffix } from "bun:ffi";

const { symbols: db } = dlopen(`libengine.${suffix}`, {
  engine_open:  { args: [FFIType.cstring], returns: FFIType.ptr },
  engine_query: { args: [FFIType.ptr, FFIType.cstring], returns: FFIType.cstring },
  engine_close: { args: [FFIType.ptr], returns: FFIType.void },
});

const h = db.engine_open(Buffer.from("file://./local.db\0"));   // P1: pure embedded
const rows = db.engine_query(h, Buffer.from("select 1 as n\0"));

Specs it draws on

Engine Core (parser → plan → executor, MVCC, WAL) · Storage Interface (the trait + LocalFileStorage) · Local Cache (in-process page cache) · Bun Integration (bun:ffi + @twilldb/bun).

Exit criteria & gate

MUST open a file:// database from Bun via FFI, run DDL + DML + queries, and read back correct results across a process restart (file persistence).
MUST pass a basic correctness gate: a SQL conformance smoke suite plus MVCC snapshot-isolation checks (a reader sees a consistent snapshot across a concurrent committed write).
SHOULD demonstrate the @twilldb/bun wrapper end-to-end in a sample Bun app with no native build step beyond the prebuilt library.

Phase 1 payoff

A working, persistent, in-process database in a Bun app with zero infrastructure. This validates the whole embeddability claim before a single byte touches the network, and freezes the C ABI that every later phase reuses unchanged.

Phase 2 — Add the ObjectStorage backend

embedded scale-to-zero Add a second Storage impl, ObjectStorage, and the database becomes disaggregated and scale-to-zero — while staying embedded. Nothing about the engine or the FFI surface changes: you flip the connection string from file:// to s3://, and the same code path now durably bottoms out on object storage.

Deliverables

MUST implement ObjectStorage: an LSM page store that writes layers to S3/R2, versioned by LSN (in-memory layer → delta layers → image layers), with compaction and GC past the PITR window.
MUST implement the commit log as an ordered append log whose durability bottoms out on S3 conditional writes (compare-and-swap) — atomic ordered appends and single-writer fencing without a separate Raft/Paxos cluster.
MUST wire the local cache (now load-bearing) to keep S3's hundreds-of-ms latency off the read hot path; without it every read is a network call.
SHOULD support any S3-compatible durability tier (AWS S3, Cloudflare R2, MinIO) selected by the connection string / config.

Durability is the non-negotiable invariant introduced here

A commit MUST NOT be acked from an in-memory buffer before the WAL record is durably stored via the S3 CAS append. Caching may hide read latency completely; it MUST NEVER hide commit latency, or you get acked-write loss. This invariant is the reason Experiment 4 is a hard gate for this phase.

Specs it draws on

Storage Interface (the trait the new impl satisfies) · Object-Storage Backend (LSM page store + S3-CAS commit log) · Local Cache (now mandatory, not optional).

Exit criteria & gates

MUST open an s3:// database from the same binary as Phase 1, with no recompile, and pass the Phase 1 correctness suite against the object-storage backend.
MUST pass Benchmark Experiment 1 (single-commit latency floor, p50/p99/p999, same-region vs cross-region) — establishing the worst-case sequential-commit ceiling.
MUST pass Benchmark Experiment 2 (group-commit throughput curve) — proving batching lifts the plateau well above the Exp-1 ceiling (W1 beaten).
MUST pass Benchmark Experiment 4 (crash safety / durability) unconditionally: every acked commit survives kill -9 at adversarial points, no torn or half state, no acked-write loss.

Experiment 4 is disqualifying if failed

A fast commit path that loses an acked write under crash is disqualifying regardless of its latency numbers. Phase 2 does not exit until Exp 4 passes; durability is non-negotiable before any real data lands on the backend.

Phase 3 — Add engine-server + pgwire

server multi-client Wrap the same engine library in a wire-protocol listener (engine-server) speaking a subset of the Postgres wire protocol. This unlocks remote / server mode for multi-client access and for any tool that expects to talk to Postgres. Because the engine is wire-compatible, you inherit PostgREST and Bun's built-in Bun.sql client for free — no bespoke REST layer, no bespoke driver.

Deliverables

MUST ship engine-server: the Phase-1 engine library linked into a network listener; the listener is the only difference from the embedded binary.
MUST implement a pgwire subset sufficient for Bun.sql, pgbench, and PostgREST — simple + extended query, auth handshake, parameter binding, row description, and error responses.
SHOULD provide pooler guidance: place PgBouncer / pgcat in transaction mode in front of the server to absorb serverless connection bursts.
MAY document a NAPI native-addon path as an alternative client binding that works in Bun and Node from one package.

import { SQL } from "bun";
const sql = new SQL("postgres://user@host:5432/mydb");   // P3: same engine, network listener
const rows = await sql`select 1 as n`;

Specs it draws on

Server Mode & Wire Protocol (the listener + pgwire subset + pooler) · Bun Integration (Bun.sql client path) · Capabilities (PostgREST attaches in front of server mode for free).

Exit criteria & gates

MUST serve a connection from Bun.sql and from pgbench with no client-side adapter beyond a standard Postgres client.
MUST re-run Experiment 2 (group-commit curve) in server mode and confirm the plateau holds end-to-end through the wire protocol and pooler.
MUST run Experiment 3 (write-contention wall) in server mode to confirm the red-quadrant detector behaves identically over the wire (same-row writers flatten; N separate DBs scale linearly).
SHOULD demonstrate PostgREST pointed directly at engine-server, generating a REST API with no REST code written.

Phase 4 — Add the controller

scale-to-zero instant clones Add the lifecycle controller that cold-starts the engine on first connection, tears it down when idle (true scale-to-zero), and creates a branch as a new LSN pointer over shared immutable layers (copy-on-write → instant clones). This is what delivers the "dozens of tools, each on its own branch off one base" goal: many branches with near-zero marginal storage until they diverge.

Deliverables

MUST implement the controller's lifecycle state machine: cold-start on first connection, run, idle-detect, stop; only object-storage bytes bill at rest.
MUST implement branch operations: branch = a new LSN pointer over shared immutable layers (copy-on-write), with near-zero marginal storage until divergence.
MUST implement single-writer fencing via the commit log's CAS token, so exactly one writer per database is authoritative across cold-start / restart transitions.
SHOULD expose idle-timeout and keep-warm knobs tuned from Experiment 5's cold-read distributions.

Controller lifecycle: cold-start acquires the CAS fence and warms the cache; idle returns to STOPPED, releasing the fence while durable bytes stay on object storage.

Specs it draws on

Lifecycle & Controller (state machine, branch-on-LSN, CAS fencing).

Exit criteria & gates

MUST scale to zero on idle and cold-start on the next connection, with durable state intact and the CAS fence correctly re-acquired by exactly one writer.
MUST create a branch from a base in O(pointer) time, with the branch's writes invisible to the base and vice versa, and storage growing only on divergence.
MUST pass Experiment 5 (cold read after scale-to-zero): warm vs forced-cold read distributions (p50/p99) characterized to set idle-timeout and keep-warm policy.
MUST pass a thundering-herd test: N concurrent cold starts (the Exp-5 extension) bounded to a documented spin-up saturation point without correctness loss.

Phase 5 (optional) — Capabilities

optional With the core platform shipped, grow capability by the build-in vs compose rule: storage/execution capabilities go into the engine, interface/service capabilities are composed around it.

MAY build vector search in-core: a vector type plus an HNSW access method behind the same Storage trait the B-trees use — so it inherits scale-to-zero, branching, and S3-backing for free (an agent can branch its memory).
MAY compose better-auth (in-process library writing to the embedded DB), PostgREST (REST layer in front of server mode), and DuckDB (OLAP engine over shared S3 Parquet/Iceberg) — none welded into the core.
SHOULD treat the HNSW cold-traversal case as the nastiest cache scenario in the engine: keep the working-set graph resident in memory, S3 as the cold floor only.

Full detail in Capabilities: Build-in vs Compose.

Build-vs-buy shopping list

Per slot: whether to build from scratch, and the existing piece to start from (2026). The two build slots — the storage trait and the Bun binding — are the load-bearing original work; almost everything else is inherited or adapted.

Slot	Build from scratch?	Existing piece to start from
Engine core	buy No	libSQL (SQLite-compat, pluggable) / DuckDB pattern (OLAP proof) / LeanStore-Umbra (research OLTP)
Storage trait	build Yes (thin) — the key artifact	your own; a narrow trait the engine calls instead of touching disk
Page store on S3	buy No	SlateDB (LSM natively on object storage), RocksDB (if local-tier caching underneath)
Commit log	mostly buy Mostly no	S3 conditional-write (CAS) append-log designs; WarpStream-style log-on-S3 lineage
Durability	buy No	S3 / Cloudflare R2 / MinIO (R2 = zero egress)
Pooler (server mode)	buy No	PgBouncer, pgcat (transaction mode)
Wire protocol (server)	buy No	pgwire (Rust), jackc/pgproto3 (Go)
Bun binding	build Yes (thin)	`bun:ffi` over `engine.h`, or a NAPI addon (Bun + Node)

Where the original engineering goes

The two genuinely-built slots are thin by design: the Storage trait (the seam that makes embeddable + disaggregated stop being contradictory) and the Bun binding (the FFI shim over the C ABI). The commit log is "mostly buy" — the S3-CAS append pattern is a 2026 design you assemble, not a cluster you operate. Everything else is an off-the-shelf piece you adapt.

Critical path & inter-phase dependencies

The phases are strictly serial on their backbone, but a few items can be developed in parallel once their prerequisite seam exists.

MUST complete Phase 1's frozen Storage trait and engine.h ABI before Phase 2 begins; the ObjectStorage impl targets the trait, and the FFI surface is reused unchanged.
MUST complete Phase 2's durable commit path (Exp 4 green) before Phase 3; server mode must not be the first place durability is exercised under load.
MUST complete Phase 2's CAS commit log before Phase 4; the controller's single-writer fencing is the CAS token, so fencing cannot exist until the log does.
SHOULD allow the pgwire subset (Phase 3) and the LSM page store (Phase 2) to be built by separate workstreams once the trait is frozen, since they touch disjoint surfaces.
MAY start the controller's branch-on-LSN design (Phase 4) during Phase 2, because branching is a property of the LSN-versioned immutable layers built there.

P1 libengine + engine.h + LocalFileStorage   [freeze Storage trait + C ABI]
       │
       ▼
P2 ObjectStorage (LSM page store + S3-CAS log) [Exp 1, 2, 4]
       │                         │
       ▼                         ▼
P3 engine-server + pgwire   P4 controller (idle-stop, branch, CAS fence)
   [Exp 2/3 server mode]        [Exp 5 + thundering-herd]
       │                         │
       └───────────┬─────────────┘
                   ▼
P5 (optional) vector-in-core · compose better-auth / PostgREST / DuckDB

Backbone is serial (P1 → P2 → {P3, P4}); the CAS log built in P2 is the shared prerequisite for both server-mode fencing and the controller's fence token.

Per-phase Definition of Done

A phase is Done only when its deliverables ship, its gate experiments are green, and the artifact below is demonstrable end-to-end.

Phase 1 DoD: Prebuilt libengine.* + engine.h + @twilldb/bun bind from a sample Bun app; file:// DB persists across restart; basic-correctness + MVCC snapshot suite green. C ABI declared frozen.
Phase 2 DoD: Same binary opens an s3:// DB with no recompile; correctness suite green on object storage; Exp 1, Exp 2, and Exp 4 all pass — Exp 4 unconditionally (no acked-write loss under kill -9).
Phase 3 DoD: engine-server serves Bun.sql and pgbench over the pgwire subset; Exp 2 and Exp 3 reproduce in server mode through the pooler; PostgREST demonstrated against the server with no REST code written.
Phase 4 DoD: Engine scales to zero on idle and cold-starts with the CAS fence re-acquired by exactly one writer; branch created in O(pointer) time with copy-on-write isolation; Exp 5 distributions captured; thundering-herd (N concurrent cold starts) bounded without correctness loss.
Phase 5 DoD (optional): Vector type + HNSW index behind the Storage trait, branching with the DB; better-auth / PostgREST / DuckDB composed against the shared object-storage floor without modifying the engine core.

Milestone table

Milestone	Headline deliverable	Specs	Gate(s)	State after
M1 — Embedded	libengine + engine.h + @twilldb/bun + LocalFileStorage	02, 03, 05, 08	Basic correctness + MVCC snapshot	In-process persistent DB, zero infra
M2 — Disaggregated	ObjectStorage (LSM page store + S3-CAS commit log)	03, 04, 05	Exp 1, Exp 2, Exp 4 (durability)	Disaggregated + scale-to-zero, still embedded
M3 — Server	engine-server + pgwire subset + pooler guidance	07, 08, 12	Exp 2 / Exp 3 in server mode	Multi-client; PostgREST + Bun.sql for free
M4 — Controller	Lifecycle state machine + branch-on-LSN + CAS fencing	06	Exp 5 + thundering-herd	True scale-to-zero + instant clones
M5 — Capabilities (opt.)	Vector-in-core; compose better-auth / PostgREST / DuckDB	12	Per-capability acceptance	Platform grows by composition

smallest shippable: embedded + file

connection-string flip to s3://

listener over the same library

idle-stop + branch-on-LSN

Failure modes & sequencing risks

MUST NOT let the Storage trait drift after Phase 1; a trait change forces a rewrite of LocalFileStorage and breaks the "config not rebuild" guarantee the whole sequence rests on.
MUST NOT ship Phase 2 with a commit path that acks before the S3-CAS append is durable; this is the acked-write-loss failure Exp 4 exists to catch.
SHOULD guard against Phase 3 quietly forking the engine: if engine-server embeds a divergent build, the "same engine, two front-ends" invariant is lost and bugs split across paths.
SHOULD treat the cold-start tail (Phase 4) as a first-class metric, not an afterthought; scale-to-zero means caches start cold after every idle period, and the thundering-herd case can saturate spin-up.
MAY see object-storage-native OLTP maturity move under the project (SlateDB, S3-CAS log designs, libSQL rewrite are the active frontier); pin dependency versions and re-run the gate suite on upgrades.

Open questions & risks

MAY need to decide NAPI-vs-FFI for the Bun binding earlier than Phase 3 if a single Bun+Node package is a hard requirement, since it changes the artifact shipped from Phase 1.
MAY want to bring branch-on-LSN forward into Phase 2 (it is a property of the immutable LSN-versioned layers) if instant clones are needed before full controller automation.
SHOULD define how much of the pgwire surface is "subset enough": the minimum for Bun.sql / PostgREST / pgbench may be smaller than full Postgres compatibility, and the line should be explicit before Phase 3.
MAY need a WASM build track (Cloudflare Workers + R2) as a parallel deliverable; per Deployment Targets this is a port, not a recompile, and is not on the native critical path above.
MUST keep the per-tool go/no-go decision (from the benchmark plan) attached to the rollout, so write-heavy hot-row outliers are routed to coupled Postgres rather than blocking a phase gate.

Related specifications

STORStorage Interfacethe seam frozen in Phase 1 and reused unchanged across every later phase — the reason storage is config, not a rebuild. BUNBun Integrationthe bun:ffi binding shipped in Phase 1 and the Bun.sql client path unlocked in Phase 3. OBJObject-Storage Backendthe LSM page store + S3-CAS commit log built in Phase 2; its CAS token becomes the Phase 4 fence. SRVServer Mode & Wire Protocolthe pgwire listener wrapped around the same library in Phase 3, plus pooler guidance. CTLLifecycle & Controllerthe Phase 4 lifecycle state machine, branch-on-LSN, and single-writer CAS fencing. BENCHBenchmark & Validation Planthe experiments that gate Phases 2–4 (Exp 1/2/4, Exp 2/3 server, Exp 5 + thundering-herd). CAPCapabilities: Build-in vs Composethe optional Phase 5 work: vector search in-core, and composing better-auth / PostgREST / DuckDB. ARCHArchitecture Overviewthe three-slot layer map this roadmap delivers slot by slot, phase by phase.