11 Draft v0.1 Updated 2026-06-20

Deployment Targets

The same stack — stateless scale-to-zero compute, the embedded engine, and an object-storage durability floor — runs unchanged across substrates that differ on exactly two axes that matter: whether the engine ships as a native binary or as WebAssembly, and whether reading a page back on a cache miss costs egress; everything else is portable, so you pick the target first, choose native-or-WASM second, and let the architecture follow the workload across vendors rather than binding to any one.

Purpose & scope

This page is the buildable expansion of source §10 (Deployment targets). It defines where the stack runs, what each substrate demands of the build, and how to choose. The architecture — stateless compute over the object-storage backend, fronted by the local cache, driven by the lifecycle controller — is identical on every target. Only two variables change:

engine form

native FFI library vs compiled-to-WASM module

egress cost

price of a cache-miss read (per-GB out, or $0 on R2)

Scope: the native-vs-WASM fork and what it forces; the target matrix (Cloud Run + GCS, Cloud Run + S3, Cloudflare Workers + R2, generic VM + any S3); why R2 zero-egress is structurally — not marginally — important for a read-on-miss engine; the three concrete scenario fits; the portability principle that makes substrate a per-workload choice; per-target requirements and checklists; an egress cost model; failure modes; acceptance criteria; and open questions. Out of scope: the engine internals (Engine Core), the storage backend implementation (Object-Storage Backend), and the Bun binding mechanics (Bun Integration) — this page consumes those, it does not redefine them.

Responsibilities & non-goals

Responsibilities

MUST define, for every supported substrate, which engine form is required (native or WASM) and which object store it pairs with.
MUST keep the compute platform and the object store independently swappable, with the engine and its storage trait as the fixed point.
MUST make the engine form the first decision, because it determines the build pipeline before any other deployment choice.
SHOULD document the egress consequence of each pairing, since cache-miss rate couples directly to the egress bill on metered stores.
SHOULD provide a per-target checklist a senior engineer can execute to stand up the stack on that substrate.

Non-goals

MUST NOT introduce a second source of truth: every target shares one object-storage durability floor; compute everywhere is stateless.
MUST NOT bind the architecture to a managed database or a single vendor — portability across substrates is the explicit design goal.
MAY leave provider-specific tuning (instance sizes, idle timeouts, region pinning) to deployment configuration rather than the spec.
Multi-region active/active and cross-region replication are out of scope here; single-writer-per-DB (source §4) and same-region compute↔bucket co-location are assumed.

The one fork that decides everything: engine form

There is exactly one branch in this spec that changes what you build, not merely how you configure it: whether the target runs native code or only JavaScript/WASM. Decide it before anything else, because it selects the build pipeline.

Pick the target first — it dictates native vs WASM

The native path is a recompile of the engine core for the target triple. The WASM path is a port: a different storage backend (Worker binding, not an S3 SDK), no native filesystem, and a single-threaded execution model. You cannot defer this choice — it determines the toolchain, the storage backend implementation, and the cold-start profile. Choosing the target first means choosing native or WASM first.

Native targets — the default, drop-in path

Cloud Run, any VM, any container host. The engine runs as a native Rust library linked into Bun via bun:ffi, exactly as specified in Bun Integration (source §5). No porting; the cdylib/staticlib outputs and the engine.h C ABI bind directly. This is the most drop-in path and the recommended default.

MUST build the engine as a native library for the target triple (e.g. x86_64-unknown-linux-gnu / aarch64-unknown-linux-gnu) and link it into the Bun process via FFI.
MUST use the ObjectStorage backend talking to the bucket over the native S3-compatible client (GCS, S3, MinIO, B2).
SHOULD use the two-tier cache in full, including the NVMe local-file cache, since a native host has a real filesystem.
MAY run multi-threaded; the executor and background compaction/GC use OS threads as on any server.

WASM targets — Cloudflare Workers, a real but ported path

Cloudflare Workers run only JavaScript and WebAssembly — native-code binaries cannot be uploaded. The engine therefore must be compiled to WASM, and three structural constraints follow. SQLite-lineage engines (the recommended starting point, source §2.A/§6) do compile to WASM, so this is a genuine path — but it is a port, not a recompile, and WASM Workers start slower and run larger than their JS equivalents.

MUST compile the engine to WebAssembly (e.g. wasm32-unknown-unknown / WASI-subset) and load it inside the Worker isolate — no native FFI on this target.
MUST implement the storage backend against the Worker's R2 binding (env.BUCKET), not a raw S3 SDK; the binding is the only sanctioned object-store access from a Worker.
MUST NOT assume a native filesystem: there is none. The NVMe local-file cache tier MUST be disabled (lfc.enabled = false, see Local Cache); only the in-isolate tier-1 cache is available.
MUST NOT assume threads: Workers are single-threaded. The executor MUST run single-threaded; no shared-memory parallelism, no background OS threads.
SHOULD budget for the port as explicit work (toolchain, binding-backed storage, single-thread executor) and expect larger start size / slower cold start than the JS-only Worker baseline.

Both paths converge on the same object-storage floor and the same storage trait; the fork is in the build, not the architecture.

Target matrix

The four supported substrates. The engine form is shown as a pill — native (FFI) is green-default; WASM is the build outlier.

Target	Compute	Object store	Engine form	Egress on cache-miss reads	Notes
Cloud Run + GCS	native container, scales to zero	Google Cloud Storage	native (FFI)	GCS egress applies	most drop-in; native engine + better-auth in one Bun process; cold start = container spin-up + cache warm
Cloud Run + S3	native container, scales to zero	AWS S3	native (FFI)	S3 egress applies (cross-cloud = pricier)	as above; watch cross-cloud egress if compute and bucket sit in different clouds
Cloudflare Workers + R2	Worker isolate, scales to zero	R2	WASM	$0 — R2 has zero egress	edge-native; engine MUST be WASM; R2 egress-free is a structural win for a read-on-miss engine
Generic VM + any S3	your container, always-on or manual	any S3-compatible (incl. MinIO, B2)	native (FFI)	varies by provider	full control, you own ops; portable everywhere

Reading the matrix

Three of four targets are native-FFI; only Workers requires the WASM port. The object-store column is interchangeable on the native rows — Cloud Run can point at GCS, S3, MinIO, or B2 with no architecture change, because the difference is entirely a configuration string on the storage trait (s3://… / gcs://…). The egress column is the cost axis; the engine-form column is the build axis.

Why R2 zero-egress specifically matters for THIS engine

Egress-free storage removes the one cost that scales with miss rate

This engine is disaggregated: object storage is the durability floor, and the engine reads pages back on every cache miss (see Local Cache). On S3 or GCS those reads are egress — billed per GB out — so the engine's cache-miss rate is directly coupled to the egress bill. Worse, scale-to-zero produces a cold cache after every idle period (source §4; Experiment 5): cold cache → more misses → more egress. R2 charges zero egress at any volume, which severs that coupling entirely — a cache miss costs only the per-operation fee, never bandwidth. For an engine whose whole model is "read back on miss," egress-free storage is not a marginal saving; it removes the single cost that grows with your miss rate.

The causal chain, made explicit, is the reason R2 is privileged in this design rather than treated as just another bucket:

   scale-to-zero  ──▶  idle ──▶ cold cache on next request
                                      │
                                      ▼
                            higher cache-miss rate
                                      │
                       ┌──────────────┴──────────────┐
                       ▼                              ▼
            S3 / GCS: every miss = egress      R2: egress = $0
            cost ∝ miss rate (couples)         cost = per-op fee only (decoupled)

On metered stores the cost line tracks the miss line; R2 flattens it. Scale-to-zero, the feature that makes the platform cheap at rest, is exactly what manufactures the misses.

R2 trade-off — it is not free of all charges

Zero egress does not mean zero cost. R2 still bills per-operation Class A (writes/list) and Class B (reads) fees, and it lacks some S3-specific features. A read-on-miss engine still pays the Class B op cost per miss; that cost is small and flat per operation, but it exists. The win is specific and structural: bandwidth — the component that scales with miss volume — is removed, not all cost.

Consequence for the cache SLO: on S3/GCS the cache hit-ratio target is partly a cost SLO, since each miss is billable bandwidth; the cache.miss.object_reads counter doubles as a spend signal. On R2 the cache remains a latency optimization but stops being an egress-spend lever, which de-risks the scale-to-zero cold-cache regime structurally.

The three scenario fits

Three concrete deployment shapes, each mapped to a verdict and the one thing to watch.

Scenario 1 — scale to ~1,000 at peak

Verdict: fits well — if the 1,000 is spread across many databases / branches. Stateless compute plus a shared object-storage floor is a horizontal fan-out that scales near-linearly, and readers do not contend. The only poor fit is 1,000 writers funnelling into ONE database — that hits the single-writer-per-DB ceiling and the hot-row wall of source §8/§9 (Hot-Row Contention).

SHOULD distribute the 1,000 across many DBs/branches so the load fans out across independent lanes (the sharding lever, source §8 W2).
MUST NOT route 1,000 concurrent writers into a single database; that is the red quadrant and belongs on coupled Postgres.
SHOULD measure the thundering-herd case: 1,000 simultaneous cold starts. Extend Experiment 5 to "N concurrent cold starts" to find the spin-up saturation point.

Thundering-herd of cold starts

Scale-to-zero means 1,000 simultaneous first-requests-after-idle can trigger 1,000 simultaneous cold starts: container/isolate spin-up plus cache warm, all hitting the floor at once. This is a distinct failure mode from steady-state load. The benchmark plan must add a concurrent-cold-start sweep; the controller may need keep-warm or staggered admission to bound it.

Scenario 2 — Cloudflare Workers + R2 as the whole database

Verdict: purest edge expression, with one explicit catch. Scale-to-zero isolates plus egress-free durability is the cleanest possible statement of the architecture: edge latency and zero bandwidth cost at the durability floor. The catch is the engine-form fork — this target requires the WASM build.

MUST ship the WASM-compiled engine and the R2-binding storage backend (no FFI, no FS, single-thread; see the engine-form section).
SHOULD budget the WASM port as explicit, scheduled work — it is the gating cost of this scenario, not a free flip.
MAY accept slower/larger cold starts than a JS-only Worker in exchange for edge latency and zero egress.

Scenario 3 — Bun app + better-auth (embedded) + engine = Cloud Run + bucket

Verdict: most drop-in — the direct fulfilment of the original goal. better-auth is a library that stores its state (users, sessions) in your database, so it composes with the embedded engine in one Bun process — no external auth service (see Capabilities). Cloud Run runs the native engine as-is, with no WASM constraint. Cloud Run + GCS ≈ Cloud Run + S3 ≈ the disaggregated stack: the compute platform and object store are swappable while the architecture stays identical. That portability is the entire point.

SHOULD compose better-auth in-process as a library writing to the same embedded engine — one Bun process, no separate auth deployment.
MUST run the native engine unmodified on Cloud Run (no WASM port required on this target).
MAY swap GCS ⇄ S3 ⇄ any S3-compatible store via configuration with no code change; the disaggregated stack is identical across them.

Scenario	Substrate	Engine form	Verdict	Watch
~1,000 at peak, fanned out	Cloud Run / VM + bucket	native	fits well across many DBs	thundering-herd of cold starts; never 1,000 writers → 1 DB
Edge: whole DB on the edge	Workers + R2	WASM	purest edge expression	budget the WASM port
Bun + better-auth + engine	Cloud Run + bucket	native	most drop-in, goal-direct	nothing structural — this is the baseline

Portability principle

Two swappable axes, one fixed point

The compute platform and the object store are both swappable. The engine and its storage trait are the fixed point. Choose the substrate per workload — never the other way around.

This is the structural reason the stack is not locked to a vendor. The storage trait abstracts the object store behind get_page / append_wal / flush; the deployment substrate abstracts compute behind "a process that links (native) or loads (WASM) the engine." Both are interchangeable so long as the trait contract holds. The selection rule by workload:

edge / global low-latency: Cloudflare Workers + R2, with a WASM build. Pay the port; gain edge latency and zero egress.
native / most drop-in: Cloud Run + bucket (GCS or S3). Native FFI engine, scale-to-zero, better-auth in-process. The default.
full control / own ops: generic VM + any S3-compatible store (incl. MinIO, B2). Native engine, you own the lifecycle and the egress terms.

The architecture follows the workload across substrates rather than binding to one vendor: you run your own engine against whatever object storage is cheapest or closest, never tethered to a managed database. Moving a workload between targets is a build-form and configuration change, not a re-architecture.

Egress cost model

A back-of-envelope model the controller and capacity planning use to reason about spend. The egress bill is a function of the miss rate, the page size, and the per-GB egress price of the store.

egress_bytes_per_period
    = read_ops_per_period
    × (1 − cache_hit_ratio)          # misses fall through to the floor
    × avg_page_or_layer_bytes

egress_cost_per_period
    = egress_bytes_per_period / GB
    × store_egress_price_per_GB      # S3/GCS: > 0   ·   R2: 0

# plus, on every store including R2:
op_cost_per_period
    = miss_ops_per_period × class_B_read_op_price
    + write_ops_per_period × class_A_write_op_price

SHOULD treat (1 − cache_hit_ratio) as the dominant cost lever on S3/GCS: halving the miss rate halves egress spend.
SHOULD account for scale-to-zero raising the effective miss rate (cold caches after idle) when estimating S3/GCS egress; Experiment 5 supplies the cold-miss distribution.
MUST still budget per-operation (Class A/B) fees on R2 even though its store_egress_price_per_GB term is zero.
SHOULD co-locate compute and bucket in the same region/cloud on S3/GCS to avoid cross-cloud egress multipliers (Cloud Run + S3 cross-cloud is the priciest cell in the matrix).

Per-target requirements & checklists

Cloud Run + bucket (GCS or S3) — native

MUST build the native engine library for the Cloud Run image's target triple and link it into Bun via bun:ffi.
MUST configure the ObjectStorage backend with the bucket URL (gcs://… or s3://…) and same-region credentials.
SHOULD enable both cache tiers, including the NVMe LFC, sized to the instance's local disk.
SHOULD set the controller idle timeout / keep-warm to balance scale-to-zero savings against cold-start p99.
MAY compose better-auth in-process for the BaaS shape (Scenario 3).

Cloudflare Workers + R2 — WASM

MUST compile the engine to WebAssembly and load it in the isolate; no native binary may be uploaded.
MUST implement storage against the R2 binding (env.BUCKET.get/put), not an S3 SDK.
MUST NOT enable the NVMe LFC (lfc.enabled = false) or spawn threads; tier-1 in-isolate cache only, single-threaded executor.
SHOULD minimize WASM module size to bound cold-start time, which is slower/larger than a JS-only Worker.
SHOULD rely on R2 zero egress to absorb the higher cold-cache miss rate that the no-LFC, scale-to-zero profile produces.

Generic VM + any S3-compatible store — native, self-managed

MUST link the native engine and point the backend at any S3-compatible endpoint (AWS S3, MinIO, Backblaze B2).
SHOULD own the lifecycle explicitly (always-on or manual start/stop) since there is no managed scale-to-zero.
SHOULD verify the chosen store's egress and per-op pricing against the cost model; terms vary widely by provider.
MAY use a local MinIO floor for fully self-hosted / air-gapped deployments.

Failure modes & edge cases

Failure / edge case	Target(s)	Effect	Mitigation
Thundering-herd cold starts	all scale-to-zero	N simultaneous spin-ups + cold caches saturate floor and compute	keep-warm, staggered admission, extend Exp 5 to N concurrent cold starts
Cross-cloud egress blowout	Cloud Run + S3 (split cloud)	every cache miss billed at premium cross-cloud egress	co-locate bucket with compute, or move to R2
Native binary on WASM target	Workers	build/upload rejected — Workers run no native code	compile to WASM; storage via R2 binding
LFC assumed on Workers	Workers	no filesystem → cache layer fails / falls through every read	set `lfc.enabled = false`; tier-1 only
Threaded executor on Workers	Workers	single-thread isolate cannot spawn threads	single-threaded executor build for the WASM target
1,000 writers → one DB	all	single-writer ceiling / hot-row wall; throughput collapses	fan out across DBs/branches; route the irreducible outlier to Postgres (§10)
Low hit ratio on metered store	S3 / GCS	egress bill tracks miss rate; cost SLO breached	raise hit ratio; or move to R2 to decouple cost from misses

Dependencies / existing pieces to start from

MUST reuse the same engine core and storage trait across every target — the engine is the fixed point, only its build form changes.
SHOULD reuse the object-storage backend (LSM-on-S3 page store + S3-CAS commit log) on native targets, swapping only the client/endpoint per store.
SHOULD start the WASM path from a SQLite-lineage engine known to compile to WASM (source §6, libSQL), then replace its storage with the R2-binding backend.
SHOULD use Cloud Run / a generic container as the first deployable substrate (native, drop-in) before attempting the WASM port.
MAY use MinIO as a local S3-compatible floor for development across all native scenarios.

Acceptance criteria / definition of done

MUST demonstrate the identical application running on at least two substrates (e.g. Cloud Run + GCS and generic VM + MinIO) with only configuration changes — no source changes — proving compute/object-store swappability.
MUST demonstrate the WASM-compiled engine serving reads and commits inside a Cloudflare Worker against R2 via the binding, with lfc.enabled = false and a single-threaded executor.
MUST verify the engine-form gate: a native binary upload to Workers is rejected, and a WASM build runs — documented as the build-pipeline branch.
SHOULD publish, per target, a cold-start p99 and a steady-state read/commit p99 (from Experiments 1 and 5) so targets are comparable.
SHOULD run the extended Experiment 5 "N concurrent cold starts" sweep on at least one scale-to-zero target and report the saturation point.
SHOULD confirm, by measurement, that R2 cache-miss reads incur zero egress charge while S3/GCS misses do, validating the cost model's egress term.
MUST confirm Scenario 3 end to end: a Bun process with embedded engine + in-process better-auth on Cloud Run + bucket, no external auth service.

Open questions & risks

MAY need a shared codebase strategy for the native and WASM builds (single-thread / no-FS feature flags vs separate crates) — open: how much of the executor can be conditionally compiled vs forked?
Open: what is the realistic WASM cold-start and module-size penalty vs native, and is it within the edge-latency budget for the target tools? Needs Experiment 1/5 on the WASM target.
Open: the thundering-herd saturation point (N concurrent cold starts) per substrate, and whether keep-warm or staggered admission is the better controller lever — see Lifecycle & Controller.
Risk: on S3/GCS the cache hit-ratio SLO is partly a cost SLO; a regression couples directly to spend. R2 structurally de-risks this, but its per-op Class A/B fees still accrue and must be modelled.
Risk: R2 lacking some S3-specific features could force divergence in the storage backend between R2 and S3/GCS targets — open: which features does the backend actually depend on, and are any R2-unsupported?
Risk: cross-cloud egress on Cloud Run + S3 (split cloud) can dominate cost; co-location must be enforced by deployment policy, not left to chance.

Related specifications

Local Cachethe miss rate that couples to egress; LFC must be off on the WASM/Workers target. Lifecycle & Controllerscale-to-zero, keep-warm, and the thundering-herd cold-start lever. Bun Integrationthe native FFI binding the drop-in Cloud Run / VM targets use as-is. Benchmark & Validation PlanExperiment 5 cold reads, extended to N concurrent cold starts per target. Storage Interfacethe trait that is the fixed point while compute and object store swap. Object-Storage Backendthe S3/GCS/R2 durability floor reached on every cache miss. Hot-Row Contention Strategywhy 1,000 writers into one DB is the poor-fit case for Scenario 1. Capabilities: Build-in vs Composebetter-auth composed in-process for the drop-in Cloud Run scenario.