Capabilities: Build-in vs Compose · Serverless OLTP Engine Spec

Purpose & scope

This specification governs how new capabilities are added to the platform without bloating the engine. It is the placement contract: for any proposed feature it answers the binary question "build it into the engine, or compose it around the engine?" and then specifies how each placement attaches to the existing layers (engine core, storage trait, server mode, object-storage floor).

It expands source §11 into a buildable specification: the deciding rule, the one capability that is built in (vector search), the three capabilities that are composed around (better-auth, PostgREST, DuckDB), the canonical composition map, the contrast with the Postgres-plus-extensions monolith, and the normative requirements that keep the core from accreting interface code.

Relationship to the rest of the spec

This page does not introduce new storage or commit machinery. It composes the pieces already specified elsewhere: the Storage trait (vector index attaches here), server mode / pg wire protocol (PostgREST attaches here), Bun integration (better-auth attaches here), and the object-storage backend (DuckDB attaches here). Capabilities inherit branching, scale-to-zero, and S3-backing only to the extent they sit on top of those existing seams.

The deciding rule

The rule (normative)

If it is a STORAGE or EXECUTION capability — a type, an index, a query operator — build it INTO the engine.

If it is an INTERFACE or a SERVICE — auth, REST, an analytics frontend — COMPOSE it AROUND the engine.

The rule exists because the two kinds of capability have different cost models and different lifecycles. A storage/execution capability is exercised on the data path on every query; it must live where the executor and the page cache live, behind the same Storage trait, or it cannot share MVCC, snapshots, branching, and the cache. An interface/service capability is exercised at the boundary; it is request-shaped, optional, and frequently absent (an embedded deployment has no REST surface at all). Welding a boundary concern into the core would make it un-optional, un-branchable as an independent unit, and impossible to scale or idle separately.

How to classify a proposed capability

Apply this test in order; the first match decides placement.

Question	If yes	Placement
Does it add or change a column type the executor must read/compare/index?	storage capability	build IN
Is it a new access method (index) the planner can choose?	storage capability	build IN
Is it a query operator the executor evaluates over rows/pages?	execution capability	build IN
Is it an authentication / authorization service holding its own state?	service	compose
Is it a request/response surface (REST/GraphQL/RPC) over the data?	interface	compose
Is it a second engine with a different physical layout (columnar/OLAP)?	execution, but separable	compose

The OLAP row at the bottom is the one subtlety: analytics is an execution capability, but it demands a columnar physical layout the row engine deliberately does not have (the HTAP boundary, see DuckDB below). When an execution capability requires a fundamentally different storage layout, it does not go into the row engine — it becomes a second engine composed over the same storage floor.

Contrast: monolith-plus-extensions vs small-core-plus-composition

Postgres is a monolith: a single process that bundles types, indexes, planner, executor, REST-via-extension, vectors-via-extension, analytics-via-extension. Everything is welded into one address space, one scaling unit, one branch unit, one fate. You take the whole thing or none of it; an extension you do not use still ships in the binary, still couples to the same WAL, still scales as one.

This engine is the opposite shape by design (storage is decoupled at the Storage trait, see Architecture Overview): a small core (rows + vectors + the storage trait) surrounded by composable layers that share one object-storage floor. Each composed piece scales, scales-to-zero, and branches independently because it was never welded into the core.

   POSTGRES (monolith + extensions)        THIS ENGINE (small core + composition)

   ┌───────────────────────────────┐       better-auth   PostgREST     DuckDB
   │  planner · executor · types   │       (in-proc lib)  (front layer) (2nd engine)
   │  ┌────────┐ ┌─────────────┐   │            │             │             │
   │  │pgvector│ │ PostgREST*  │   │            ▼             ▼             ▼
   │  ├────────┤ ├─────────────┤   │       ┌─────────────────────────────────┐
   │  │ OLAP*  │ │   auth*     │   │       │   small engine core (rows +       │
   │  └────────┘ └─────────────┘   │       │   vectors + Storage trait)        │
   │        one WAL / one fate     │       └────────────────┬──────────────────┘
   └───────────────┬───────────────┘                        │
                   ▼                        ┌───────────────▼───────────────┐
            one storage volume              │   shared object-storage floor  │
                                            │        (S3 / R2 / GCS)         │
        all scale/branch as one unit        └────────────────────────────────┘
                                              each layer scales/idles/branches
                                                       independently

Left: one welded binary, one storage volume, one fate. Right: a small core plus composable layers, all sharing one object-storage floor, each independently scalable.

Built IN — vector search (pgvector-equivalent)

Vector search is a storage/index capability, not an interface, so by the rule it belongs inside the engine. It is also the single highest-value addition for agent workloads: vector search is how agent memory and retrieval work, so this capability converts the engine from "a small SQL store" into "the substrate an agent runs its memory on."

What is added to the core

MUST add a vector type to the type system — a fixed-length array of f32, length declared at column definition time (e.g. vector(768)), validated on insert.
MUST add an HNSW index as a new access method behind the same Storage trait the B-trees already use; the HNSW graph is stored as pages, not as a side file.
MUST expose distance operators the planner can push into the index scan (cosine, L2, inner-product) so a top-k nearest-neighbour query is answerable by the access method, not by a full scan plus sort.
SHOULD reuse the usearch / hnswlib algorithm lineage adapted to paged storage rather than inventing an ANN algorithm.

Why it inherits branching, scale-to-zero, and S3-backing for free

The HNSW graph is just more pages flowing through the same Storage trait. Because the trait is the only path the index has to durable bytes, the index automatically participates in everything the trait provides:

scale-to-zero: the graph pages live in object storage like any other page; idle compute can be torn down and the index re-warmed on cold start from S3 (see Lifecycle & Controller).
branching: a branch is a new LSN pointer over shared immutable layers; the vector index is made of those layers, so branching the database branches the vector index with it. An agent forks its memory, experiments on the fork, and discards it at near-zero marginal storage cost. Postgres's pgvector cannot do this — its index is welded to a single monolithic volume.
S3-backing / PITR: the graph is versioned by LSN exactly like row pages, so point-in-time recovery and the durability floor cover it without special-casing.

The branching payoff is the differentiator

An agent that can fork its own retrieval memory, run a speculative episode against the fork, and throw the fork away is doing something pgvector structurally cannot. This capability exists only because the index rides the same versioned, immutable-layer storage trait as the rows.

The caveat: HNSW is the nastiest cache case in the engine

Cold-traversal amplification

HNSW is a graph, and traversal is random-access pointer-chasing. Over object storage a cold traversal can become many sequential round-trips — each hop is a potential cache miss, and on egress-billed stores a cache miss is egress. This is "cache-miss-is-egress" at its absolute worst, because the misses are serial (you cannot prefetch hop n+1 until hop n resolves).

Vector search therefore leans harder on the local cache than anything else in the engine. The mitigation is structural, not incidental:

MUST keep the working-set graph resident in the local cache; object storage is the cold floor, not the traversal medium.
SHOULD size the cache against the graph working set, not just the row working set — the two compete and the graph's miss penalty is multiplied by traversal depth.
SHOULD warm the upper HNSW layers (the small, dense entry-point layers) eagerly on cold start so the first hops never miss.
MAY pin frequently-queried sub-graphs to keep latency-critical retrieval off the cold path entirely.

Composed AROUND — overview

The next three capabilities are NOT built into the core. They wrap the engine, they are optional, and they are absent when the engine is embedded in-process (an embedded deployment has rows and vectors but no auth service, no REST surface, no second analytics engine unless the host app composes them). Each attaches at a different seam, and the seam determines how it scales and branches.

in-process

better-auth — library in the Bun app

in-front

PostgREST — layer over server mode

over-storage

DuckDB — second engine on shared S3

Composed AROUND — better-auth (auth)

better-auth fills the "auth" slot of the original BaaS wishlist. It is a library, not a service, and that is exactly why it composes cleanly: it stores its own state — users, sessions, accounts — in your database rather than in an external auth provider.

How it attaches

better-auth runs in-process, inside the Bun app, alongside the embedded engine. It writes users and sessions to the same in-process database via the engine's FFI/native binding (see Bun Integration). That database syncs to object storage like any other write. There is no external auth service and no second network hop for a login.

Bun process
 ├─ better-auth (library)  ──  writes users / sessions ─┐
 │                                                      │
 └─ engine (embedded lib)  ←───────────────────────────┘  ──►  object storage
        (rows + vectors,                                        (S3 / R2 / GCS)
         in-process via FFI)

better-auth is a library co-resident in the Bun process; it persists into the embedded engine, which syncs to the shared object-storage floor. No external auth service.

Behaviour & consequences

MUST NOT be built into the engine core — auth is a service holding its own schema, and the engine must remain auth-agnostic so it can be embedded with or without it.
MUST persist all auth state (users, sessions, accounts, verification tokens) through the normal engine write path so it inherits durability, branching, and PITR.
SHOULD live in the same database (or a same-instance branch) as the application data so a session lookup is a local function call, not a cross-service round-trip.
MAY be omitted entirely in deployments that do not need auth — its absence costs nothing because it was never welded in.

Why "library, in-process" is the right shape here

Because better-auth's state lives in the engine, branching the database branches the auth state with it (a staging branch gets its own user table for free), and because the lookup is in-process it does not pay the networked-commit handoff for reads. This is the most drop-in deployment shape — see Deployment Targets, scenario 3 (Bun app + better-auth on Cloud Run + bucket).

Composed AROUND — PostgREST (REST API)

PostgREST introspects a schema and auto-generates a REST API from it. It is a separate process that sits in front of the database — and that is the correct placement, because an auto-REST layer is an interface, and interfaces belong at the boundary wrapping server mode (not the embedded mode).

The key payoff: you may not build a REST layer at all

Because the engine speaks the Postgres wire protocol in server mode (see Server Mode & Wire Protocol), PostgREST can point at the engine directly. PostgREST already exists, already introspects pg-compatible catalogs, and already maps HTTP verbs to SQL. The engine inherits a mature REST surface for free — wire-compatibility means you do not write a REST layer, you adopt the existing one.

HTTP client
    │  GET /items?select=name&price=gt.10
    ▼
PostgREST  (introspects schema, maps HTTP ⇄ SQL, auto-generates REST)
    │  pg wire protocol
    ▼
engine (server mode, speaks pgwire)
    │  Storage trait
    ▼
object storage  (S3 / R2 / GCS)

PostgREST is a front layer over server mode. Because the engine is wire-compatible, PostgREST attaches with zero engine changes — the REST API is inherited, not built.

Requirements

MUST NOT be built into the engine — it is an interface and must remain optional and absent in embedded deployments.
MUST attach only over server mode; the engine's pg-wire compatibility is the contract that makes PostgREST work unmodified.
SHOULD be deployed as its own process/container so it scales and idles independently of the engine.
SHOULD sit behind the same pooler the server mode uses for connection-burst absorption (PgBouncer/pgcat, transaction mode).
MAY be replaced by any other pg-wire-speaking front layer (GraphQL gateways, ORMs, BI tools) for the same reason — wire-compatibility is the general unlock, PostgREST is one instance of it.

Acceptance check for "REST for free"

The deliverable is not "we built REST"; it is "unmodified PostgREST, pointed at the engine's pgwire endpoint, serves a working REST API against a real schema." If that holds, the engine's wire-compatibility is doing its job and no REST code was written.

Composed AROUND — DuckDB (analytics / OLAP)

Analytics is an execution capability, but it demands a columnar physical layout the row engine deliberately does not have. This is the HTAP boundary: do not make the row engine analytical — OLAP scans want column-at-a-time vectorized execution over compressed columnar storage, which is structurally at odds with the row-oriented, MVCC, single-writer OLTP layout the engine is built for. Bolting OLAP into the row engine would compromise both.

How it attaches: a second engine over shared storage

Instead of one HTAP engine, build an HTAP system: point DuckDB at the same object storage as Parquet/Iceberg. The row engine handles transactions; DuckDB handles aggregations; both see one S3 dataset. This is possible precisely because storage is decoupled — the durability floor is shared, so a second engine can read it without going through the first.

              ┌─ engine (row, OLTP)   ── writes ──┐
              │                                    │
 shared S3 ───┤                                    ├── one dataset
 (R2/GCS)     │                                    │
              └─ DuckDB (columnar, OLAP) ── reads ─┘

 periodic:  engine row data ──► materialize columnar
            (Parquet / Iceberg) snapshots on S3 ──► DuckDB queries them

Two engines, one storage floor. The row engine writes; DuckDB reads materialized columnar snapshots of the same S3 dataset. A legitimate HTAP system, not a single HTAP engine.

The materialization contract

MUST NOT add columnar/OLAP execution to the row engine; the HTAP boundary is held by separation, not by a hybrid layout.
MUST publish row data to the shared store in an open columnar format (Parquet, optionally catalogued via Iceberg) that DuckDB reads directly.
SHOULD materialize columnar snapshots periodically (batch/CDC), accepting analytic staleness bounded by the snapshot interval; OLTP stays the source of truth.
SHOULD run DuckDB as its own process so the analytic workload neither contends with nor blocks the transactional lane.
MAY expose the Iceberg table to other readers (BI tools) — the columnar snapshot is a shared artifact, not a private one.

Why this is legitimate HTAP and not a hack

It is two engines over one storage floor, each playing to its layout's strength, sharing a dataset rather than copying it through a pipeline. The decoupled storage trait is what makes a second engine on the same bytes possible — a coupled OLTP database would force an export/replicate hop instead.

The composition map

The canonical placement table. Each capability's kind determines its placement, and the placement determines how it attaches to the existing layers.

Capability	Kind	Placement	How it attaches
Vector search (pgvector-equiv)	storage / index	built IN	new index type via the `Storage` trait; inherits branching + scale-to-zero + S3-backing
better-auth	service (auth)	composed in-process	library; stores its state in the embedded engine, syncs to object storage
PostgREST	interface (REST)	composed in front	wraps server mode; free via pg wire-compatibility, no engine changes
DuckDB	execution (OLAP)	composed over storage	second engine reading shared S3 (Parquet / Iceberg snapshots)

Why this beats Postgres + extensions

The capabilities are the same; the bundling is the opposite. Postgres ships them welded into one binary, one WAL, one scaling unit. This engine ships them unbundled: a tool that needs vectors + auth but not REST pulls exactly those, in-process, and pays for nothing else.

Property	Postgres + extensions	Small core + composition
Granularity of adoption	whole monolith; unused extensions still ship	pull exactly the capabilities a tool needs
Scaling unit	one process scales as a unit	each layer scales / idles independently
Branching	monolithic volume; pgvector cannot branch its index	built-in capabilities branch with the DB; composed layers branch with their own state
Scale-to-zero	coupled compute + storage; not idle-friendly	stateless compute over object storage; idle to zero
Embedded form	not embeddable as a library	core embeds in-process; interface layers simply absent
Engine size / blast radius	large; one fate shared across features	engine stays small; failures isolated per layer

The structural claim: each layer scales, idles, and branches independently because nothing is welded to the core. The engine stays small; the platform grows by composition around shared object storage. That is the inversion of the monolith, and it is only possible because storage was decoupled at the Storage trait in the first place.

Normative requirements

Placement discipline

MUST classify every proposed capability with the deciding rule before implementation: storage/execution → in-core; interface/service → composed.
MUST keep the engine core limited to rows, vectors, and the storage trait; no interface or service code may enter the core.
MUST NOT add an interface or service (auth, REST, analytics frontend) into the engine binary; such capabilities are composed and optional.
MUST NOT add columnar/OLAP execution to the row engine; OLAP is composed as a second engine over shared storage.

Built-in capabilities (vector search)

MUST implement the vector type and HNSW access method behind the existing Storage trait, storing the graph as pages so it inherits branching, scale-to-zero, and S3-backing.
MUST ensure branching the database branches the vector index atomically with it.
MUST keep the working-set HNSW graph resident in the local cache; object storage is the cold floor, never the traversal medium.

Composed capabilities

MUST compose better-auth in-process as a library persisting into the embedded engine; no external auth service.
MUST let any pg-wire client (PostgREST and others) attach over server mode with no engine modification.
MUST compose DuckDB over the shared object store via open columnar snapshots (Parquet/Iceberg), keeping OLTP as source of truth.
SHOULD deploy each composed layer as its own process so it scales and idles independently.
MAY omit any composed layer entirely; its absence must cost nothing in the core.

Configuration knobs

vector.dim: declared fixed length of a vector column (e.g. 768, 1536); enforced on insert.
vector.metric: distance metric for the HNSW index — cosine | l2 | inner_product.
hnsw.m: graph out-degree; higher = better recall, larger graph, more pages to keep resident.
hnsw.ef_construction: build-time candidate-list size; build quality vs build cost.
hnsw.ef_search: query-time candidate-list size; recall vs latency, the primary online tuning knob.
cache.vector_working_set: cache budget reserved for the HNSW graph, separate from the row page budget.
auth.database: which database/branch better-auth persists into (defaults to the application DB).
rest.enabled: whether a PostgREST layer is provisioned in front of server mode (absent in embedded deployments).
olap.snapshot_interval: cadence of columnar materialization to S3; bounds analytic staleness.
olap.format: columnar publish format — parquet or iceberg.

Failure modes & edge cases

Failure / edge case	Where	Mitigation
Cold HNSW traversal cache-misses serially over S3	vector search	keep working-set graph resident; warm upper layers on cold start; size cache for the graph (see Local Cache)
Egress amplification on miss-heavy ANN over egress-billed stores	vector search	prefer R2 (zero egress) for vector-heavy workloads; raise cache budget (see Deployment Targets)
Vector-index branch diverges from row branch	vector search	index pages ride the same LSN/layers — branch atomically; never store the graph as a side file
Auth state lost on branch or restore	better-auth	persist all auth state through the normal write path so PITR/branching cover it
PostgREST sees a pg-wire incompatibility (catalog/type quirk)	PostgREST	treat as a server-mode wire-compat gap, fix in Server Mode; do not fork PostgREST
DuckDB reads a half-materialized snapshot	DuckDB	publish snapshots atomically (Iceberg commit / atomic Parquet swap); readers see whole snapshots only
Analytic queries contend with the transactional lane	DuckDB	run DuckDB as its own process reading materialized snapshots, never the live OLTP pages
Composed layer welded in over time (scope creep)	placement discipline	enforce the must-not rules; reject PRs that add interface/service code to the core

Dependencies / existing pieces to start from

SHOULD adapt the usearch / hnswlib lineage for the HNSW access method rather than authoring an ANN algorithm from scratch.
SHOULD adopt better-auth unmodified as an in-process library, supplying it the engine's binding as its persistence layer.
SHOULD adopt PostgREST unmodified, relying on the engine's pg-wire compatibility (see Server Mode).
SHOULD adopt DuckDB unmodified, pointed at the shared object store via the Parquet/Iceberg readers it already ships.
MUST build only the thin glue: the vector type + HNSW-over-Storage integration, and the columnar materialization job. The composed engines/libraries are not built in-house.

Acceptance criteria / definition of done

MUST support a vector(N) column, an HNSW index over it, and a top-k nearest-neighbour query answered by the index.
MUST demonstrate that branching a database produces a branch whose vector index is independently mutable and whose changes do not leak back to the base.
MUST demonstrate vector-search latency staying flat while the working set is cache-resident and degrading gracefully (not catastrophically) on a forced cold traversal.
MUST demonstrate a Bun app where better-auth persists users/sessions into the embedded engine with no external auth service, surviving a branch and a restore.
MUST demonstrate unmodified PostgREST serving a working REST API against the engine's pg-wire endpoint.
MUST demonstrate DuckDB querying a materialized columnar snapshot of the same dataset the row engine writes, with both reading one S3 location.
SHOULD demonstrate that omitting any composed layer leaves the core unchanged in size and behaviour.

Open questions & risks

MAY — HNSW page layout: what page granularity and node-packing minimize serial round-trips on cold traversal? Needs benchmarking against the benchmark plan (cold-read experiment extended to graph traversal).
MAY — incremental vs full HNSW rebuild on heavy update churn; how does graph quality degrade with deletes, and when must the index be rebuilt?
MAY — columnar staleness bound: what snapshot interval keeps DuckDB analytics "fresh enough" without overwhelming the materialization job, and should CDC replace batch?
MAY — does any other high-value capability (full-text search, geospatial) belong in-core by the rule, or are they better composed?
MAY — pg-wire surface completeness: which catalog/type details must server mode expose so the long tail of pg-wire tools (not just PostgREST) attach unmodified?
MAY — authorization beyond authentication: where do row-level security / policy enforcement live — in the engine, in PostgREST, or in the app composing them?

Related specifications

STORStorage Interfacethe trait the HNSW vector index attaches behind, inheriting branching and S3-backing SRVServer Mode & Wire Protocolpg-wire compatibility is why PostgREST attaches for free BUNBun Integrationthe in-process binding better-auth uses to persist into the embedded engine CACHELocal Cachevector search leans hardest on the cache; cold HNSW traversal is the worst case OBJObject-Storage Backendthe shared S3 floor DuckDB reads as Parquet/Iceberg snapshots DEPLOYDeployment TargetsR2 zero-egress matters most for miss-heavy vector search; composed-layer placement ARCHArchitecture Overviewthe small-core-plus-composition shape this page applies to new capabilities ROADRoadmap & Build Sequencewhen vector search and each composed layer land in the build order