Purpose & scope

This page is the consolidation point for honest costs and operational risk across the whole specification. It does two jobs:

  • Expands the four honest tradeoffs (source §4) into impact-bearing subsections: each names the mechanism, who it bites, and the concrete mitigation, with a pointer to the spec that owns the deeper treatment.
  • Pulls the risks scattered through specs 0113 into one risk register with stable IDs, severity, likelihood, mitigation, residual risk, and the related spec — so nothing lives only in a footnote.

It is normative for the acceptance gates that decide whether the engine is allowed to hold real data, and for the decision boundary that places each tool on the correct backend. It is not a substitute for the benchmark plan (09) or the contention strategy (10); those own the measurements and the levers. This page tracks them.

Reading order

If you read only one thing here, read “The durability non-negotiable” and the risk register. R-02 (acked-write loss) is disqualifying and gates everything else; the rest is tuning and placement.

Responsibilities & non-goals

Responsibilities

  • State every architectural cost in plain terms, without marketing softening.
  • Maintain the canonical risk register and keep risk IDs stable across spec revisions.
  • Define the go/no-go decision boundary and the “when NOT to use this engine” guidance.
  • Pin the durability invariant and the sign-off criteria that gate production data.

Non-goals

  • Re-deriving the W1/W2 mechanics or benchmark methodology — that is 09 and 10.
  • Choosing a deployment substrate — that is 11.
  • Sequencing the build — that is 13.
  • Inventing new risks not grounded in the design note; this page consolidates, it does not speculate beyond the documented architecture.

Normative requirements

  • MUST NOT acknowledge a commit to the client before its WAL records are durably stored (CAS-append acked) on the commit log. Caching may hide read latency completely; it MUST NOT hide commit latency.
  • MUST pass Experiment 4 (crash safety of the commit path, 09) unconditionally before any real data is admitted to a database. A fast commit path that loses an acked write under crash is disqualifying regardless of latency numbers.
  • MUST run Experiments 1–3 (09) and record p50/p99/p999 for a tool before that tool is placed on the disaggregated backend.
  • MUST route any tool whose writes are same-row, same-DB, high-rate, and latency-critical to coupled Postgres rather than this engine (10).
  • MUST enforce single-writer fencing per database via the commit log’s CAS token (06); a writer that loses its lease MUST be fenced before another may append.
  • SHOULD mitigate cold start with a keep-warm ping or a longer idle timeout for latency-sensitive tools, tuned from Experiment 5 (06, 09).
  • SHOULD co-locate compute and object store in the same region/provider to avoid cross-cloud egress and latency surprises (11).
  • SHOULD keep the HNSW working-set graph resident in the local cache; vector search leans hardest of anything on the cache (05, 12).
  • MAY reserve a dedicated primitive (e.g. Redis) for a tool that is both extremely contended and latency-critical, accepting the added ops/consistency cost (10).

Honest tradeoff T1 — Write latency

Mechanism. A commit pays a network/S3 round-trip (or quorum/CAS ack) instead of a local fsync. The durability floor moved off the local disk and onto object storage, so the acknowledged-commit path crosses the network by construction. This is the price of disaggregation, not a bug to be cached away.

Impact
Per-commit latency rises from ~µs (local fsync) to ~ms (networked CAS ack). The tail (p99/p999) is what S3 jitter inflates, not the mean.
Who it bites
Sustained write firehoses — tools committing at high sequential rate where each commit blocks the next. Read-heavy and bursty workloads barely notice; their commits are infrequent and the read path is served from cache.
Mitigation
Group commit / queue + batch amortizes the round-trip across many concurrent commits (defeats W1). The mitigation that does not work: caching — it hides read latency but cannot hide commit latency without breaking durability. Any write-heavy tool MUST be benchmarked explicitly (Exp 1 + Exp 2) before commitment.

Benchmark, don’t assume

“Bursty barely notices, firehose must measure” is a guideline, not a guarantee. The sequential-commit ceiling is ~1/latency until group commit kicks in. See Experiment 1 (latency floor) and Experiment 2 (group-commit throughput curve) in 09.

Honest tradeoff T2 — Single writer per database

Mechanism. Without an added coordination layer, the engine permits one writer at a time per database — the SQLite-lineage ceiling. The commit log’s CAS token fences the single writer; there is no built-in multi-writer consensus.

Impact
Write concurrency within a single database does not scale by adding writers; it scales by adding databases (lanes). A naive per-DB lock means every writer in the DB serializes, even on unrelated rows.
Who it bites
One giant, high-concurrency single-database OLTP workload. It does not bite the target shape — many small databases / dozens of tools, each its own lane.
Mitigation
Sharding: many small DBs = many independent lanes (defeats W2). Database boundaries substitute for row-level lock granularity (10). The mitigation that does not work: batching — it makes one lane’s commits cheap but adds zero lanes.

Correct-but-slow by default

The single lane is correct for the hot-row case — every contended write sees the prior committed value, exactly as every serious database serializes contended writers. The cost is performance on the outlier, never correctness. See 10 for the layered strategy (serialize universally, shard where you control the SQL, route the irreducible outlier to Postgres).

Honest tradeoff T3 — Cold start

Mechanism. Compute is stateless: the controller stops the engine when idle (scale-to-zero), so the first request after an idle period pays process-start plus cache-warm. Caches start cold after every idle period by design.

Impact
The first request after idle has elevated latency: process/container/isolate spin-up + a burst of cache-miss reads against object storage. Steady-state requests are unaffected.
Who it bites
Latency-sensitive tools with spiky, intermittent traffic — and, at scale, the thundering-herd case of many simultaneous cold starts (e.g. 1,000 at once) saturating spin-up.
Mitigation
A keep-warm ping or a slightly longer idle timeout, tuned from Experiment 5’s warm-vs-cold distributions and its “N concurrent cold starts” extension (06, 09). The trade is residual idle billing against tail latency.

Honest tradeoff T4 — Maturity

Mechanism. Object-storage-native OLTP is the active frontier, not a settled field. The load-bearing pieces — SlateDB (LSM on object storage), S3-CAS commit-log designs, the libSQL rewrite — are fast-moving and have fewer battle-tested guarantees than a coupled Postgres.

Impact
Fewer hardened guarantees, thinner operational track record, APIs and durability semantics still evolving. Bugs in dependencies become our correctness bugs.
Who it bites
Anyone treating the engine as a drop-in Postgres replacement for mission-critical data on day one, and the team carrying the integration as dependencies churn.
Mitigation
Keep the Storage trait the fixed point so backends are swappable (03); pin and vet dependency versions; lean on the loom/simulation and crash-injection testing (Exp 4) rather than trusting upstream maturity; start tools on this engine that tolerate the risk profile, and reserve coupled Postgres for the ones that don’t.

The two weak axes — recap

The architecture has exactly two weak axes. They are routinely conflated; they are different bottlenecks with different levers. Conflating them leads to applying the wrong mitigation and concluding the architecture “doesn’t work” when the real fault was the lever choice.

AxisMechanismCorrect leverLever that does NOT helpOwns it
W1 — commit latency each commit = network/S3 round-trip (~ms) instead of local fsync (~µs) group commit / queue + batch (and read cache for reads) caching cannot hide commit latency without breaking durability 09
W2 — write serialization single writer per database (SQLite lineage) many small DBs = many lanes (sharding) batching adds no lanes; it only makes one lane’s commits cheap 10

The one correction to internalize

The queue defeats W1 (latency); the sharded many-DB design defeats W2 (lanes). Combined, group commit batches the expensive handoffs and many-small-DBs supplies independent lanes, making the coarse networked engine behave close to a fine-grained local one for most workloads. The only genuinely unsolvable case is many concurrent writers contending on the same row in the same database — that tool belongs on coupled Postgres.

Consolidated risk register

Every risk scattered through the spec, gathered here with a stable ID. Impact and Likelihood are coarse (Low/Med/High). Residual is the risk that remains after the listed mitigation is in place. IDs are stable across revisions; add new risks with new IDs, never renumber.

ID Risk Category Impact Likelihood Mitigation Residual Related
R-01 Commit-latency tail — S3/CAS jitter inflates p99/p999 commit latency beyond a tool’s tolerance Performance High High Group commit / queue + batch; measure p99/p999 (Exp 1+2) per tool; same-region object store Med — tail never fully removed; sequential firehose still latency-bound 09
R-02 Acked-write loss under crash — process killed after CAS-append/ack but before durability/materialization, losing an acknowledged commit Correctness Critical Low Durability invariant (never ack before WAL durable); fencing; gate via Experiment 4 (adversarial kill-9 + restart verification), loom/simulation Low — but disqualifying if it ever recurs; zero tolerance 09, 02
R-03 Hot-row red-quadrant tool placed on the wrong backend — same-row, high-rate, latency-critical writes assigned to the disaggregated engine Performance High Med Exp 3 contention wall as red-quadrant detector; route outlier to coupled Postgres; offer shard-counter / event-log patterns where SQL is controlled Low — caught at benchmark gate; correctness never at risk, only speed 10, 09
R-04 Thundering-herd cold starts — many databases spin up simultaneously after idle, saturating spin-up capacity Operational Med Med Keep-warm ping / longer idle timeout; tune from Exp 5 “N concurrent cold starts”; admission/queueing at controller Med — tail under correlated spikes; residual idle billing if kept warm 06, 11
R-05 Cross-cloud egress surprise — compute and bucket in different clouds/regions bill cache-miss reads as egress at a premium Operational Med Med Co-locate compute + object store; prefer R2 (zero egress) for read-on-miss workloads; budget egress from miss-rate × cold-start frequency Low — once co-located; per-operation fees remain 11
R-06 WASM port effort/cost for Workers — Cloudflare Workers run only JS/WASM, so the engine is a port (not a recompile); WASM isolates start slower and run larger Maturity Med High (if edge target chosen) Treat WASM as a scoped port; route storage through the Worker R2 binding (not raw S3 SDK); accept single-threaded/no-FS constraints; default to native (FFI) targets unless edge is required Med — ongoing maintenance of two build forms; WASM perf gap 11, 08
R-07 HNSW cold-traversal latency — vector index is a graph; a cold traversal is many sequential cache-miss round-trips over object storage (cache-miss-is-egress at its worst) Performance High Med Keep working-set graph resident in local cache; S3 as cold floor only; usearch/hnswlib lineage adapted to paged storage; warm before serving Med — first-touch / post-eviction traversals stay slow 12, 05
R-08 Dependency immaturity — SlateDB / S3-CAS log / libSQL rewrite are fast-moving frontier components with thinner guarantees than coupled Postgres Maturity High Med Keep Storage trait the swappable seam; pin + vet versions; crash-injection + loom rather than trusting upstream; reserve Postgres for data that can’t absorb the risk Med — frontier churn persists; correctness rests on our own tests 03, 04
R-09 Lost-lease split-brain writer — a writer believes it still holds the lease after losing it, two writers append to one DB Correctness High Low Single-writer fencing via commit-log CAS token; the stale writer’s CAS fails and is fenced before any second appender succeeds Low — fencing makes split-brain a fenced rejection, not corruption 06, 04
R-10 Cache / NVMe exhaustion — shared-buffer + local file cache (LFC) outgrows local NVMe; eviction thrash collapses to all-network reads Operational Med Med Size LFC to working set; eviction policy + spill bounds; monitor miss rate and NVMe headroom; scale instance or shard DBs before saturation Med — pathological working sets still degrade to network latency 05

R-02 is the gate

R-02 (acked-write loss) is the highest-severity entry in the register and the only one with zero tolerance. It is gated by Experiment 4 and must pass before any real data is admitted. All other risks are managed; R-02 is a hard blocker.

Risk categories & severity model

The register uses four categories and a coarse severity model so risks can be triaged consistently.

Correctness
data integrity — highest severity, zero tolerance for acked-write loss
Performance
latency/throughput on the wrong side of the boundary
Operational
cost, capacity, cold-start, egress — managed, not blocking
Maturity
frontier dependencies / port effort — hedged via the trait seam

Severity ordering. Correctness > Performance > Operational > Maturity for blocking decisions: a Correctness risk that fails its gate stops the release; Performance risks move a tool to a different backend; Operational and Maturity risks are tracked and budgeted. Likelihood is reduced by mitigation; Impact is intrinsic to the mechanism and generally is not.

When NOT to use this engine

The platform is not all-or-nothing — benchmark the boundary once, then place each tool on the correct side. Two shapes belong on coupled Postgres, not on this engine:

  • MUST NOT host one giant, high-concurrency, single-database OLTP workload on this engine. The single-writer-per-DB ceiling (W2) cannot be sharded away when the workload is inherently one database, and per-DB locking serializes unrelated writers.
  • MUST NOT host the irreducible outlier — same-row, same-DB, high-rate, latency-critical writes. This is the red quadrant: contention is genuinely sequential, batching can’t add lanes, and the ~10ms networked handoff is paid per contended write. Route it to coupled Postgres.

Where it lands well

Most small / bursty / read-heavy internal tools land green: many small databases (independent lanes), reads served from cache, commits infrequent or batchable. The occasional write-heavy outlier gets a different backend. The architecture follows the workload — it does not demand the workload conform to it.

Decision boundary summary

The per-tool go/no-go rule (from 09), restated as the canonical decision boundary:

                       run Exp 1 + Exp 2  (latency floor + group-commit curve)
                                  │
            p99 single-commit acceptable for the tool's write frequency?
            AND group-commit throughput > the tool's aggregate write rate?
                          ┌───────┴───────┐
                        yes               no ──────────────┐
                          │                                │
                  run Exp 3 (contention wall)              │
                          │                                │
        heavy CONTENDED writes (same rows, sequential,     │
        can't batch)?  → RED QUADRANT                      │
                  ┌───────┴───────┐                        │
                 no              yes ──► coupled Postgres ◄─┘
                  │                       (that one tool)
                  ▼
        FITS — ship on disaggregated engine
          (Exp 4 MUST already have passed — durability gate)

Per-tool placement: latency & throughput decide fit; contention detects the red quadrant; Exp 4 is the unconditional durability gate behind all of it.

QuadrantWrite shapeBackendWhy
Greenread-heavy / bursty / independent-row writes across many small DBsthis enginecache serves reads; group commit + many lanes handle writes
Ambersustained independent-row write firehose, single DBthis engine — benchmark firstgroup commit may suffice; Exp 1+2 decide
Redsame-row, same-DB, high-rate, latency-criticalcoupled Postgrescontention is sequential; networked handoff too costly per write

The durability non-negotiable

One invariant overrides every performance consideration in this spec:

Durability rule

Never acknowledge a commit from an in-memory buffer before its WAL is durably stored. Caching hides read latency completely; it must never hide commit latency — doing so produces acked-write loss (R-02 / Experiment 4). A fast commit path that loses an acked write under crash is disqualifying, regardless of latency numbers.

Concretely, the commit path crosses two adversarial points that Experiment 4 attacks with kill -9:

(a) after CAS-append issued, before client ack
On restart: the commit is either fully durable (and MUST be replayed/visible) or never acked — no torn or half state, no client believing a lost write succeeded.
(b) after client ack, before page materialization
On restart: every acked commit MUST be present and recoverable from the durable log; page materialization is a derivable, retryable step, not a durability boundary.

Crash injection MUST be deterministic (seeded) and integrate with the loom/simulation testing. This gate is unconditional and runs before any latency optimization is accepted.

Acceptance criteria & sign-off

Definition of done for the risk posture — the engine may hold real data only when all of the following hold:

  • MUST — Experiment 4 passes: zero acked-write loss, no torn/half state across both adversarial crash points, verified on restart (R-02, R-09).
  • MUST — Single-writer fencing demonstrated: a writer that loses its lease is fenced (its CAS fails) before any second writer can append (R-09).
  • MUST — Per-tool placement recorded: Exp 1+2 results on file, and Exp 3 run for any tool with plausible contention, before that tool goes live (R-01, R-03).
  • MUST — No red-quadrant tool on the disaggregated backend; the red-quadrant detector (Exp 3) reviewed for each onboarded tool (R-03).
  • SHOULD — Cold-start budget set from Exp 5 (warm/cold distributions + N-concurrent extension); keep-warm/idle-timeout configured for latency-sensitive tools (R-04).
  • SHOULD — Egress model validated: compute and object store co-located, or R2 zero-egress confirmed; per-operation fees budgeted (R-05).
  • SHOULD — Cache sizing validated against working set; NVMe headroom and miss-rate monitoring in place (R-07, R-10).
  • MUST — Every register risk has an owner and a current Residual rating; R-02 reviewed at every release.

Sign-off

Sign-off requires: (1) the durability gate green, (2) the decision boundary applied to every onboarded tool, and (3) the risk register reviewed with no open Critical/High residual lacking a mitigation owner. Maturity (T4 / R-08) is acknowledged, not closed — it is carried as an accepted, monitored risk hedged by the swappable storage seam.

Open questions

  • What numeric p99/p999 commit-latency threshold separates Amber from Red per tool class — a single platform default, or per-tool budgets? (depends on Exp 1 data)
  • At what concurrency does the thundering-herd spin-up saturate, and is admission control needed at the controller? (Exp 5 “N concurrent cold starts”)
  • Is the WASM port’s cold-start and size penalty acceptable for the edge target, or does it confine Workers+R2 to read-mostly tools? (R-06)
  • What is the eviction policy and LFC sizing heuristic that keeps the HNSW working set resident without starving B-tree pages? (R-07, R-10)
  • How are dependency-immaturity regressions (SlateDB / CAS-log upstream) detected before they reach the durability gate — continuous crash-injection in CI? (R-08)

Related specifications

Serverless OLTP Engine — internal development specification. Draft, 2026-06-20. · Author