Purpose & scope

This specification governs how new capabilities are added to the platform without bloating the engine. It is the placement contract: for any proposed feature it answers the binary question "build it into the engine, or compose it around the engine?" and then specifies how each placement attaches to the existing layers (engine core, storage trait, server mode, object-storage floor).

It expands source §11 into a buildable specification: the deciding rule, the one capability that is built in (vector search), the three capabilities that are composed around (better-auth, PostgREST, DuckDB), the canonical composition map, the contrast with the Postgres-plus-extensions monolith, and the normative requirements that keep the core from accreting interface code.

Relationship to the rest of the spec

This page does not introduce new storage or commit machinery. It composes the pieces already specified elsewhere: the Storage trait (vector index attaches here), server mode / pg wire protocol (PostgREST attaches here), Bun integration (better-auth attaches here), and the object-storage backend (DuckDB attaches here). Capabilities inherit branching, scale-to-zero, and S3-backing only to the extent they sit on top of those existing seams.

The deciding rule

The rule (normative)

If it is a STORAGE or EXECUTION capability — a type, an index, a query operator — build it INTO the engine.

If it is an INTERFACE or a SERVICE — auth, REST, an analytics frontend — COMPOSE it AROUND the engine.

The rule exists because the two kinds of capability have different cost models and different lifecycles. A storage/execution capability is exercised on the data path on every query; it must live where the executor and the page cache live, behind the same Storage trait, or it cannot share MVCC, snapshots, branching, and the cache. An interface/service capability is exercised at the boundary; it is request-shaped, optional, and frequently absent (an embedded deployment has no REST surface at all). Welding a boundary concern into the core would make it un-optional, un-branchable as an independent unit, and impossible to scale or idle separately.

How to classify a proposed capability

Apply this test in order; the first match decides placement.

QuestionIf yesPlacement
Does it add or change a column type the executor must read/compare/index?storage capabilitybuild IN
Is it a new access method (index) the planner can choose?storage capabilitybuild IN
Is it a query operator the executor evaluates over rows/pages?execution capabilitybuild IN
Is it an authentication / authorization service holding its own state?servicecompose
Is it a request/response surface (REST/GraphQL/RPC) over the data?interfacecompose
Is it a second engine with a different physical layout (columnar/OLAP)?execution, but separablecompose

The OLAP row at the bottom is the one subtlety: analytics is an execution capability, but it demands a columnar physical layout the row engine deliberately does not have (the HTAP boundary, see DuckDB below). When an execution capability requires a fundamentally different storage layout, it does not go into the row engine — it becomes a second engine composed over the same storage floor.

Contrast: monolith-plus-extensions vs small-core-plus-composition

Postgres is a monolith: a single process that bundles types, indexes, planner, executor, REST-via-extension, vectors-via-extension, analytics-via-extension. Everything is welded into one address space, one scaling unit, one branch unit, one fate. You take the whole thing or none of it; an extension you do not use still ships in the binary, still couples to the same WAL, still scales as one.

This engine is the opposite shape by design (storage is decoupled at the Storage trait, see Architecture Overview): a small core (rows + vectors + the storage trait) surrounded by composable layers that share one object-storage floor. Each composed piece scales, scales-to-zero, and branches independently because it was never welded into the core.

   POSTGRES (monolith + extensions)        THIS ENGINE (small core + composition)

   ┌───────────────────────────────┐       better-auth   PostgREST     DuckDB
   │  planner · executor · types   │       (in-proc lib)  (front layer) (2nd engine)
   │  ┌────────┐ ┌─────────────┐   │            │             │             │
   │  │pgvector│ │ PostgREST*  │   │            ▼             ▼             ▼
   │  ├────────┤ ├─────────────┤   │       ┌─────────────────────────────────┐
   │  │ OLAP*  │ │   auth*     │   │       │   small engine core (rows +       │
   │  └────────┘ └─────────────┘   │       │   vectors + Storage trait)        │
   │        one WAL / one fate     │       └────────────────┬──────────────────┘
   └───────────────┬───────────────┘                        │
                   ▼                        ┌───────────────▼───────────────┐
            one storage volume              │   shared object-storage floor  │
                                            │        (S3 / R2 / GCS)         │
        all scale/branch as one unit        └────────────────────────────────┘
                                              each layer scales/idles/branches
                                                       independently

Left: one welded binary, one storage volume, one fate. Right: a small core plus composable layers, all sharing one object-storage floor, each independently scalable.

Built IN — vector search (pgvector-equivalent)

Vector search is a storage/index capability, not an interface, so by the rule it belongs inside the engine. It is also the single highest-value addition for agent workloads: vector search is how agent memory and retrieval work, so this capability converts the engine from "a small SQL store" into "the substrate an agent runs its memory on."

What is added to the core

  • MUST add a vector type to the type system — a fixed-length array of f32, length declared at column definition time (e.g. vector(768)), validated on insert.
  • MUST add an HNSW index as a new access method behind the same Storage trait the B-trees already use; the HNSW graph is stored as pages, not as a side file.
  • MUST expose distance operators the planner can push into the index scan (cosine, L2, inner-product) so a top-k nearest-neighbour query is answerable by the access method, not by a full scan plus sort.
  • SHOULD reuse the usearch / hnswlib algorithm lineage adapted to paged storage rather than inventing an ANN algorithm.

Why it inherits branching, scale-to-zero, and S3-backing for free

The HNSW graph is just more pages flowing through the same Storage trait. Because the trait is the only path the index has to durable bytes, the index automatically participates in everything the trait provides:

scale-to-zero
the graph pages live in object storage like any other page; idle compute can be torn down and the index re-warmed on cold start from S3 (see Lifecycle & Controller).
branching
a branch is a new LSN pointer over shared immutable layers; the vector index is made of those layers, so branching the database branches the vector index with it. An agent forks its memory, experiments on the fork, and discards it at near-zero marginal storage cost. Postgres's pgvector cannot do this — its index is welded to a single monolithic volume.
S3-backing / PITR
the graph is versioned by LSN exactly like row pages, so point-in-time recovery and the durability floor cover it without special-casing.

The branching payoff is the differentiator

An agent that can fork its own retrieval memory, run a speculative episode against the fork, and throw the fork away is doing something pgvector structurally cannot. This capability exists only because the index rides the same versioned, immutable-layer storage trait as the rows.

The caveat: HNSW is the nastiest cache case in the engine

Cold-traversal amplification

HNSW is a graph, and traversal is random-access pointer-chasing. Over object storage a cold traversal can become many sequential round-trips — each hop is a potential cache miss, and on egress-billed stores a cache miss is egress. This is "cache-miss-is-egress" at its absolute worst, because the misses are serial (you cannot prefetch hop n+1 until hop n resolves).

Vector search therefore leans harder on the local cache than anything else in the engine. The mitigation is structural, not incidental:

  • MUST keep the working-set graph resident in the local cache; object storage is the cold floor, not the traversal medium.
  • SHOULD size the cache against the graph working set, not just the row working set — the two compete and the graph's miss penalty is multiplied by traversal depth.
  • SHOULD warm the upper HNSW layers (the small, dense entry-point layers) eagerly on cold start so the first hops never miss.
  • MAY pin frequently-queried sub-graphs to keep latency-critical retrieval off the cold path entirely.

Composed AROUND — overview

The next three capabilities are NOT built into the core. They wrap the engine, they are optional, and they are absent when the engine is embedded in-process (an embedded deployment has rows and vectors but no auth service, no REST surface, no second analytics engine unless the host app composes them). Each attaches at a different seam, and the seam determines how it scales and branches.

in-process
better-auth — library in the Bun app
in-front
PostgREST — layer over server mode
over-storage
DuckDB — second engine on shared S3

Composed AROUND — better-auth (auth)

better-auth fills the "auth" slot of the original BaaS wishlist. It is a library, not a service, and that is exactly why it composes cleanly: it stores its own state — users, sessions, accounts — in your database rather than in an external auth provider.

How it attaches

better-auth runs in-process, inside the Bun app, alongside the embedded engine. It writes users and sessions to the same in-process database via the engine's FFI/native binding (see Bun Integration). That database syncs to object storage like any other write. There is no external auth service and no second network hop for a login.

Bun process
 ├─ better-auth (library)  ──  writes users / sessions ─┐
 │                                                      │
 └─ engine (embedded lib)  ←───────────────────────────┘  ──►  object storage
        (rows + vectors,                                        (S3 / R2 / GCS)
         in-process via FFI)

better-auth is a library co-resident in the Bun process; it persists into the embedded engine, which syncs to the shared object-storage floor. No external auth service.

Behaviour & consequences

  • MUST NOT be built into the engine core — auth is a service holding its own schema, and the engine must remain auth-agnostic so it can be embedded with or without it.
  • MUST persist all auth state (users, sessions, accounts, verification tokens) through the normal engine write path so it inherits durability, branching, and PITR.
  • SHOULD live in the same database (or a same-instance branch) as the application data so a session lookup is a local function call, not a cross-service round-trip.
  • MAY be omitted entirely in deployments that do not need auth — its absence costs nothing because it was never welded in.

Why "library, in-process" is the right shape here

Because better-auth's state lives in the engine, branching the database branches the auth state with it (a staging branch gets its own user table for free), and because the lookup is in-process it does not pay the networked-commit handoff for reads. This is the most drop-in deployment shape — see Deployment Targets, scenario 3 (Bun app + better-auth on Cloud Run + bucket).

Composed AROUND — PostgREST (REST API)

PostgREST introspects a schema and auto-generates a REST API from it. It is a separate process that sits in front of the database — and that is the correct placement, because an auto-REST layer is an interface, and interfaces belong at the boundary wrapping server mode (not the embedded mode).

The key payoff: you may not build a REST layer at all

Because the engine speaks the Postgres wire protocol in server mode (see Server Mode & Wire Protocol), PostgREST can point at the engine directly. PostgREST already exists, already introspects pg-compatible catalogs, and already maps HTTP verbs to SQL. The engine inherits a mature REST surface for free — wire-compatibility means you do not write a REST layer, you adopt the existing one.

HTTP client
    │  GET /items?select=name&price=gt.10
    ▼
PostgREST  (introspects schema, maps HTTP ⇄ SQL, auto-generates REST)
    │  pg wire protocol
    ▼
engine (server mode, speaks pgwire)
    │  Storage trait
    ▼
object storage  (S3 / R2 / GCS)

PostgREST is a front layer over server mode. Because the engine is wire-compatible, PostgREST attaches with zero engine changes — the REST API is inherited, not built.

Requirements

  • MUST NOT be built into the engine — it is an interface and must remain optional and absent in embedded deployments.
  • MUST attach only over server mode; the engine's pg-wire compatibility is the contract that makes PostgREST work unmodified.
  • SHOULD be deployed as its own process/container so it scales and idles independently of the engine.
  • SHOULD sit behind the same pooler the server mode uses for connection-burst absorption (PgBouncer/pgcat, transaction mode).
  • MAY be replaced by any other pg-wire-speaking front layer (GraphQL gateways, ORMs, BI tools) for the same reason — wire-compatibility is the general unlock, PostgREST is one instance of it.

Acceptance check for "REST for free"

The deliverable is not "we built REST"; it is "unmodified PostgREST, pointed at the engine's pgwire endpoint, serves a working REST API against a real schema." If that holds, the engine's wire-compatibility is doing its job and no REST code was written.

Composed AROUND — DuckDB (analytics / OLAP)

Analytics is an execution capability, but it demands a columnar physical layout the row engine deliberately does not have. This is the HTAP boundary: do not make the row engine analytical — OLAP scans want column-at-a-time vectorized execution over compressed columnar storage, which is structurally at odds with the row-oriented, MVCC, single-writer OLTP layout the engine is built for. Bolting OLAP into the row engine would compromise both.

How it attaches: a second engine over shared storage

Instead of one HTAP engine, build an HTAP system: point DuckDB at the same object storage as Parquet/Iceberg. The row engine handles transactions; DuckDB handles aggregations; both see one S3 dataset. This is possible precisely because storage is decoupled — the durability floor is shared, so a second engine can read it without going through the first.

              ┌─ engine (row, OLTP)   ── writes ──┐
              │                                    │
 shared S3 ───┤                                    ├── one dataset
 (R2/GCS)     │                                    │
              └─ DuckDB (columnar, OLAP) ── reads ─┘

 periodic:  engine row data ──► materialize columnar
            (Parquet / Iceberg) snapshots on S3 ──► DuckDB queries them

Two engines, one storage floor. The row engine writes; DuckDB reads materialized columnar snapshots of the same S3 dataset. A legitimate HTAP system, not a single HTAP engine.

The materialization contract

  • MUST NOT add columnar/OLAP execution to the row engine; the HTAP boundary is held by separation, not by a hybrid layout.
  • MUST publish row data to the shared store in an open columnar format (Parquet, optionally catalogued via Iceberg) that DuckDB reads directly.
  • SHOULD materialize columnar snapshots periodically (batch/CDC), accepting analytic staleness bounded by the snapshot interval; OLTP stays the source of truth.
  • SHOULD run DuckDB as its own process so the analytic workload neither contends with nor blocks the transactional lane.
  • MAY expose the Iceberg table to other readers (BI tools) — the columnar snapshot is a shared artifact, not a private one.

Why this is legitimate HTAP and not a hack

It is two engines over one storage floor, each playing to its layout's strength, sharing a dataset rather than copying it through a pipeline. The decoupled storage trait is what makes a second engine on the same bytes possible — a coupled OLTP database would force an export/replicate hop instead.

The composition map

The canonical placement table. Each capability's kind determines its placement, and the placement determines how it attaches to the existing layers.

CapabilityKindPlacementHow it attaches
Vector search (pgvector-equiv) storage / index built IN new index type via the Storage trait; inherits branching + scale-to-zero + S3-backing
better-auth service (auth) composed in-process library; stores its state in the embedded engine, syncs to object storage
PostgREST interface (REST) composed in front wraps server mode; free via pg wire-compatibility, no engine changes
DuckDB execution (OLAP) composed over storage second engine reading shared S3 (Parquet / Iceberg snapshots)

Why this beats Postgres + extensions

The capabilities are the same; the bundling is the opposite. Postgres ships them welded into one binary, one WAL, one scaling unit. This engine ships them unbundled: a tool that needs vectors + auth but not REST pulls exactly those, in-process, and pays for nothing else.

PropertyPostgres + extensionsSmall core + composition
Granularity of adoptionwhole monolith; unused extensions still shippull exactly the capabilities a tool needs
Scaling unitone process scales as a uniteach layer scales / idles independently
Branchingmonolithic volume; pgvector cannot branch its indexbuilt-in capabilities branch with the DB; composed layers branch with their own state
Scale-to-zerocoupled compute + storage; not idle-friendlystateless compute over object storage; idle to zero
Embedded formnot embeddable as a librarycore embeds in-process; interface layers simply absent
Engine size / blast radiuslarge; one fate shared across featuresengine stays small; failures isolated per layer

The structural claim: each layer scales, idles, and branches independently because nothing is welded to the core. The engine stays small; the platform grows by composition around shared object storage. That is the inversion of the monolith, and it is only possible because storage was decoupled at the Storage trait in the first place.

Normative requirements

Placement discipline

  • MUST classify every proposed capability with the deciding rule before implementation: storage/execution → in-core; interface/service → composed.
  • MUST keep the engine core limited to rows, vectors, and the storage trait; no interface or service code may enter the core.
  • MUST NOT add an interface or service (auth, REST, analytics frontend) into the engine binary; such capabilities are composed and optional.
  • MUST NOT add columnar/OLAP execution to the row engine; OLAP is composed as a second engine over shared storage.

Built-in capabilities (vector search)

  • MUST implement the vector type and HNSW access method behind the existing Storage trait, storing the graph as pages so it inherits branching, scale-to-zero, and S3-backing.
  • MUST ensure branching the database branches the vector index atomically with it.
  • MUST keep the working-set HNSW graph resident in the local cache; object storage is the cold floor, never the traversal medium.

Composed capabilities

  • MUST compose better-auth in-process as a library persisting into the embedded engine; no external auth service.
  • MUST let any pg-wire client (PostgREST and others) attach over server mode with no engine modification.
  • MUST compose DuckDB over the shared object store via open columnar snapshots (Parquet/Iceberg), keeping OLTP as source of truth.
  • SHOULD deploy each composed layer as its own process so it scales and idles independently.
  • MAY omit any composed layer entirely; its absence must cost nothing in the core.

Configuration knobs

vector.dim
declared fixed length of a vector column (e.g. 768, 1536); enforced on insert.
vector.metric
distance metric for the HNSW index — cosine | l2 | inner_product.
hnsw.m
graph out-degree; higher = better recall, larger graph, more pages to keep resident.
hnsw.ef_construction
build-time candidate-list size; build quality vs build cost.
hnsw.ef_search
query-time candidate-list size; recall vs latency, the primary online tuning knob.
cache.vector_working_set
cache budget reserved for the HNSW graph, separate from the row page budget.
auth.database
which database/branch better-auth persists into (defaults to the application DB).
rest.enabled
whether a PostgREST layer is provisioned in front of server mode (absent in embedded deployments).
olap.snapshot_interval
cadence of columnar materialization to S3; bounds analytic staleness.
olap.format
columnar publish format — parquet or iceberg.

Failure modes & edge cases

Failure / edge caseWhereMitigation
Cold HNSW traversal cache-misses serially over S3vector searchkeep working-set graph resident; warm upper layers on cold start; size cache for the graph (see Local Cache)
Egress amplification on miss-heavy ANN over egress-billed storesvector searchprefer R2 (zero egress) for vector-heavy workloads; raise cache budget (see Deployment Targets)
Vector-index branch diverges from row branchvector searchindex pages ride the same LSN/layers — branch atomically; never store the graph as a side file
Auth state lost on branch or restorebetter-authpersist all auth state through the normal write path so PITR/branching cover it
PostgREST sees a pg-wire incompatibility (catalog/type quirk)PostgRESTtreat as a server-mode wire-compat gap, fix in Server Mode; do not fork PostgREST
DuckDB reads a half-materialized snapshotDuckDBpublish snapshots atomically (Iceberg commit / atomic Parquet swap); readers see whole snapshots only
Analytic queries contend with the transactional laneDuckDBrun DuckDB as its own process reading materialized snapshots, never the live OLTP pages
Composed layer welded in over time (scope creep)placement disciplineenforce the must-not rules; reject PRs that add interface/service code to the core

Dependencies / existing pieces to start from

  • SHOULD adapt the usearch / hnswlib lineage for the HNSW access method rather than authoring an ANN algorithm from scratch.
  • SHOULD adopt better-auth unmodified as an in-process library, supplying it the engine's binding as its persistence layer.
  • SHOULD adopt PostgREST unmodified, relying on the engine's pg-wire compatibility (see Server Mode).
  • SHOULD adopt DuckDB unmodified, pointed at the shared object store via the Parquet/Iceberg readers it already ships.
  • MUST build only the thin glue: the vector type + HNSW-over-Storage integration, and the columnar materialization job. The composed engines/libraries are not built in-house.

Acceptance criteria / definition of done

  • MUST support a vector(N) column, an HNSW index over it, and a top-k nearest-neighbour query answered by the index.
  • MUST demonstrate that branching a database produces a branch whose vector index is independently mutable and whose changes do not leak back to the base.
  • MUST demonstrate vector-search latency staying flat while the working set is cache-resident and degrading gracefully (not catastrophically) on a forced cold traversal.
  • MUST demonstrate a Bun app where better-auth persists users/sessions into the embedded engine with no external auth service, surviving a branch and a restore.
  • MUST demonstrate unmodified PostgREST serving a working REST API against the engine's pg-wire endpoint.
  • MUST demonstrate DuckDB querying a materialized columnar snapshot of the same dataset the row engine writes, with both reading one S3 location.
  • SHOULD demonstrate that omitting any composed layer leaves the core unchanged in size and behaviour.

Open questions & risks

  • MAY — HNSW page layout: what page granularity and node-packing minimize serial round-trips on cold traversal? Needs benchmarking against the benchmark plan (cold-read experiment extended to graph traversal).
  • MAY — incremental vs full HNSW rebuild on heavy update churn; how does graph quality degrade with deletes, and when must the index be rebuilt?
  • MAY — columnar staleness bound: what snapshot interval keeps DuckDB analytics "fresh enough" without overwhelming the materialization job, and should CDC replace batch?
  • MAY — does any other high-value capability (full-text search, geospatial) belong in-core by the rule, or are they better composed?
  • MAY — pg-wire surface completeness: which catalog/type details must server mode expose so the long tail of pg-wire tools (not just PostgREST) attach unmodified?
  • MAY — authorization beyond authentication: where do row-level security / policy enforcement live — in the engine, in PostgREST, or in the app composing them?

Related specifications

Serverless OLTP Engine — internal development specification. Draft, 2026-06-20. · Author