Server Mode & Wire Protocol
Server mode is the same engine library wrapped in a wire-protocol listener — the engine-server binary — that speaks a defined subset of the Postgres wire protocol so PostgREST and Bun.sql connect for free; the listener is the only thing that differs from the embedded path, and everything below it is byte-for-byte the embedded engine.
Purpose & scope
This spec expands the server side of source §1 (the Slot A interface layer and the protocols between layers) and source §2.A (the engine-server build output), plus the pooler mentions in §1, §5, and §6. It defines what engine-server is, the network protocols it terminates, how a connection maps onto an engine instance and a database/branch, and the operational surface — pooling, TLS, auth, metrics, drain, backpressure — that only exists in server mode.
The single most important framing, carried verbatim from the source: server mode is just "the same library, wrapped in a wire-protocol listener." The SQL parser, planner, executor, MVCC manager, local cache, and Storage trait are identical to the embedded build (02 — Engine Core). Server mode adds exactly one new thing on the inbound edge: a protocol listener that turns wire messages into the same engine calls that bun:ffi makes directly in-process.
Note — one engine, two front doors
Embedded (08 — Bun Integration, Path 1) and server (Path 2) are the same libengine with a different front door. Embedded: App ↔ Engine via direct C-ABI function calls, no protocol. Server: App ↔ Engine via the Postgres wire protocol or HTTP. The Engine ↔ Storage RPC and Storage ↔ Durability S3 API are unchanged and remain internal in both modes.
Responsibilities & non-goals
Responsibilities
- MUST link the unmodified engine library and expose it over a network listener, with no fork of engine behaviour between embedded and server builds.
- MUST implement a defined subset of the Postgres wire protocol (pgwire) sufficient for
Bun.sqland PostgREST to connect and run parameterized queries. - MUST route each connection to exactly one engine/database/branch and enforce single-writer-per-DB via the fencing token (source §4; 06).
- SHOULD offer an alternative HTTP/WS request/response API for environments where a raw TCP pgwire listener is awkward (e.g. the edge).
- SHOULD sit behind a transaction-mode pooler (PgBouncer / pgcat) for serverless connection bursts (source §5, §6).
- SHOULD expose TLS, authentication, observability/metrics, graceful drain, and backpressure as first-class server concerns.
Non-goals
- MUST NOT reimplement SQL parsing, planning, MVCC, or WAL generation — those are the engine core (02); the listener only marshals messages to and from it.
- MUST NOT add multi-writer concurrency for a single database; the single-writer-per-DB ceiling is deliberate (source §4) and is enforced, not worked around, here.
- MUST NOT bundle a connection pooler into
engine-server; the pooler is a separate, composable process in front (PgBouncer / pgcat). - MUST NOT implement the auto-REST layer itself — that is PostgREST composed in front (12 — Capabilities); the server only has to be wire-compatible.
- Aiming for full Postgres wire/SQL fidelity is out of scope; the target is the subset real clients in this stack actually use.
Overview & architecture
The listener is a thin shell. Inbound bytes are framed into protocol messages, authenticated once at startup, mapped to an engine handle, and translated into the same calls the embedded path uses. Results flow back out as protocol messages. Nothing about durability, MVCC, or storage changes.
serverless clients (bursty, short-lived)
Bun.sql · PostgREST · psql · any pg driver
│ many short connections
▼
┌───────────────────────────────┐
│ POOLER (server mode only) │ PgBouncer / pgcat
│ TRANSACTION mode │ N client conns → few server conns
└───────────────┬───────────────┘
│ few long-lived backend conns
▼
┌───────────────────────────────┐
│ engine-server (listener) │ pgwire | HTTP/WS
│ ─ framing / auth / TLS │
│ ─ connection → db routing │
│ ─ message ⇄ engine calls │
└───────────────┬───────────────┘
│ direct function calls (NO protocol)
▼
┌───────────────────────────────┐
│ ENGINE LIBRARY (02) │ identical to embedded build
│ parser·planner·exec·MVCC │
│ local cache (05) │
│ >>> Storage trait <<< │
└───────────────┬───────────────┘
│ Append(WAL) / GetPage@LSN (internal RPC)
▼
Storage backend (03/04) → S3 / R2 / MinIO
Server mode = pooler (optional, in front) → listener (the only new code) → the embedded engine, unchanged. The dashed seam between listener and engine is in-process function calls, not a protocol.
Critically, the boundary below the listener is the embedded boundary: the listener calls engine_open / engine_query (or richer extended-query entry points) over the same C ABI that bun:ffi binds in Path 1. This is why "the listener is the only difference" is literally true and is the property the rest of this page is built to preserve.
Postgres wire protocol — supported subset
The engine speaks the Postgres frontend/backend protocol (protocol version 3.0) over TCP. Speaking it is what lets Bun.sql connect with zero extra deps and lets PostgREST point at the engine directly (source §5, §11; 12, 08). The target is a subset — the message flows real clients in this stack use — not bug-for-bug Postgres fidelity.
Startup & authentication
The connection opens with an unauthenticated StartupMessage (no leading type byte) carrying parameters including user and database. The server then drives an auth handshake and, on success, emits server parameters and signals readiness.
- SSLRequest
- Optional pre-startup probe. Server replies a single byte
S(proceed with TLS) orN(cleartext). See TLS & auth below. - StartupMessage
- Carries
user,database(interpreted as a database/branch selector, see routing), and other parameters. Thedatabasefield is the primary routing key. - Authentication*
- Server requests an auth method; SCRAM-SHA-256 is the target (
AuthenticationSASL), with optional cleartext/MD5 only behind TLS for legacy drivers. - ParameterStatus / BackendKeyData
- Server announces
server_version,client_encoding,DateStyle, etc., and a cancel key. - ReadyForQuery
- Terminates startup; carries transaction status (
Iidle /Tin-txn /Efailed-txn). Emitted after every command cycle thereafter.
server_version must satisfy client probes
Bun.sql, PostgREST, and most drivers probe server_version and a handful of catalog queries on connect. The server MUST report a server_version the clients accept and MUST answer the small set of introspection queries those clients issue during handshake (see Open questions). Failing this is the most common reason a "wire-compatible" engine fails to connect in practice.
Simple query protocol
One Query message carries a SQL string (possibly multiple statements); the server replies with a row description, data rows, a command tag, and ReadyForQuery. This is the path Bun.sql uses for un-parameterized tagged-template fragments and the lowest-effort path to a working demo.
C: Query "SELECT id, name FROM t WHERE active"
S: RowDescription [ {id: int4}, {name: text} ]
S: DataRow [ 1, "ada" ]
S: DataRow [ 2, "bel" ]
S: CommandComplete "SELECT 2"
S: ReadyForQuery 'I' (idle, not in a transaction)
Extended query protocol
The parameterized path: Parse → Bind → Describe → Execute → Sync. This is what real drivers (including Bun.sql's parameterized queries) prefer because it separates the SQL text from values and supports prepared statements and binary result formats. It MUST be implemented for the engine to be more than a demo.
C: Parse stmt="s1" sql="SELECT id FROM t WHERE name = $1" paramtypes=[text]
C: Bind portal="" stmt="s1" params=["ada"] resultfmt=[binary]
C: Describe portal=""
C: Execute portal="" max_rows=0
C: Sync
S: ParseComplete
S: BindComplete
S: RowDescription [ {id: int4} ]
S: DataRow [ 0x00000001 ] (binary int4)
S: CommandComplete "SELECT 1"
S: ReadyForQuery 'T' (still in a transaction)
- Parse
- Compile SQL text into a named or unnamed prepared statement; optionally declare parameter OID types. Maps to engine parse/plan, cached by statement name for the connection lifetime.
- Bind
- Bind parameter values (text or binary format) to a prepared statement, producing a portal; specify per-column result formats (text/binary).
- Describe
- Request the
RowDescription(for a portal) orParameterDescription+RowDescription(for a statement) without executing. - Execute
- Run a portal, optionally bounded by a row limit; emits
DataRow* thenCommandComplete(orPortalSuspendedif the limit is hit). - Sync
- Close the implicit transaction block, flush pending errors, and emit
ReadyForQuery. On error mid-batch, the server discards messages until the nextSync. - Close
- Free a named statement or portal (
CloseComplete).
Error & notice responses
Errors and notices are field-tagged messages the engine must produce so drivers can classify failures. Each carries a severity, a SQLSTATE code, and a human message at minimum.
| Message | When | Key fields |
|---|---|---|
ErrorResponse | Statement fails (syntax, constraint, fenced writer, etc.) | S severity, C SQLSTATE, M message, optional D/H/P |
NoticeResponse | Non-fatal warning | Same shape; does not abort the command |
ReadyForQuery | After error + Sync | Reports E when the transaction is in a failed state |
- MUST implement protocol 3.0 startup, SCRAM auth, the simple query path, and the full extended query path (
Parse/Bind/Describe/Execute/Sync/Close). - MUST emit
ErrorResponsewith a valid SQLSTATE for every failure mode, including a defined code for a fenced/lost-writer write attempt. - MUST report a
server_versionand answer the connect-time introspection queries thatBun.sqland PostgREST issue. - SHOULD support binary parameter and result formats, not only text, for driver compatibility and throughput.
- SHOULD honor
CancelRequest(out-of-band cancel via theBackendKeyDatasecret) to abort a long-running query. - MAY support
COPYsub-protocol later; it is not required for the initial client set.
Start from an existing protocol library
Do not hand-roll framing. Start from pgwire (Rust) for the native engine build, or jackc/pgproto3 (Go) if a Go shim is preferred (source §6). These give the message codecs; this spec's job is to map the decoded messages onto engine calls and back.
Wire-compatibility payoff
Speaking pgwire is not a feature for its own sake — it is what makes two ecosystem pieces attach with zero bespoke code:
- Bun.sql (client, free)
- Bun ships a built-in Postgres client. Expose pgwire and a Bun app connects with
new SQL("postgres://…")— no extra dependency, no custom driver (source §5, Path 2; 08). - PostgREST (REST layer, free)
- PostgREST introspects a schema and auto-generates a REST API. Because the engine speaks pg wire in server mode, PostgREST points at it directly — you may not build a REST layer at all and inherit the existing one (source §11; 12).
The corollary for this spec: the wire subset is driven by what these two clients (plus psql and generic drivers) actually emit. The Open questions section pins the exact catalog/introspection queries that must be answered.
Alternative HTTP/WS API
For environments where a raw TCP pgwire listener is awkward — notably the edge, where the runtime may not expose arbitrary TCP and connections are HTTP-shaped — engine-server SHOULD offer a minimal HTTP/WS request/response API over the same engine. This is an interface alternative only; it does not change the engine or storage.
Request shape
POST /v1/query
Authorization: Bearer <token>
{
"database": "mydb", // routing key; may name a branch: "mydb@feature-x"
"sql": "SELECT id FROM t WHERE name = $1",
"params": ["ada"], // positional, $1-based
"txn": null // or a txn handle from a prior BEGIN
}
Response shape
200 OK
{
"columns": [ { "name": "id", "type": "int4" } ],
"rows": [ [ 1 ] ],
"command": "SELECT",
"row_count": 1,
"commit_lsn": 873 // present for write commits; the durable commit point
}
409 Conflict // fenced / not the current writer
{ "error": { "code": "55P03", "message": "not the writer for mydb" } }
- WS streaming
- A WebSocket variant MAY carry the same JSON frames for multi-statement sessions and result streaming, keeping one socket warm for a session instead of one HTTP request per statement.
- Transactions
- A
BEGINreturns an opaquetxnhandle; subsequent requests pass it;COMMIT/ROLLBACKclose it. Without a handle, each request is its own implicit transaction.
- SHOULD expose an HTTP/WS API as an alternative front door for edge/HTTP-only runtimes; it MUST map onto the identical engine calls as pgwire.
- MUST return the same SQLSTATE codes (in the JSON error object) that pgwire would return, so client error handling is uniform across both front doors.
- MUST return
commit_lsnon write commits and MUST NOT return it before the WAL is durable (the §8 durability rule applies regardless of front door). - MAY omit the HTTP/WS API on native targets where pgwire is sufficient; it is required only where raw TCP is impractical.
Connection → engine/database routing
A connection must resolve to exactly one engine instance bound to one database (and optionally one branch). The routing key is the pgwire database startup field (or the database field in the HTTP/WS body). A branch is selected by an agreed suffix syntax — db@branch — resolving to a branch pointer (06).
connection startup:
user="svc" database="mydb@feature-x"
│
▼
parse routing key → (db_id="mydb", branch="feature-x")
│
▼
controller: is an engine for (db_id, branch) resident?
─ yes → attach connection to it
─ no → cold-start engine, open Storage at branch base_lsn,
warm cache, then attach (scale-to-zero wakeup, 06)
│
▼
WRITE intent → must hold the writer lease (fencing token).
one writer per DB; readers attach freely to a snapshot LSN.
Single writer per DB, enforced by fencing
Source §4 fixes the model: one writer at a time per database. Server mode enforces it through the commit log's CAS fencing token (04, 06). The listener does not invent its own lock; it relies on the storage-layer fence:
- MUST bind a write-issuing connection to the engine instance that holds the current writer lease for that DB/branch; a second concurrent writer MUST be rejected, not silently queued at a new engine.
- MUST surface a fencing event (a writer that lost the lease) to the client as an
ErrorResponsewith a defined SQLSTATE (e.g.55P03lock-not-available class) — never an acked write from a fenced writer. - SHOULD allow many concurrent read connections per DB, each attached to a snapshot LSN; reads do not contend (source §10 — different-row/readers parallelize).
- SHOULD route many independent databases/branches to many engine instances — this is the sharding lever that recovers write lanes (W2, source §8/§9), so routing must make per-DB fan-out cheap.
- MAY proxy a misrouted connection to the node currently holding a DB's writer lease, or return a redirect, rather than starting a competing engine.
Routing must not manufacture a second writer
The dangerous failure is two engine-server nodes each cold-starting a writer for the same DB. The CAS fence makes this safe (the loser is fenced at its first append), but it is wasteful. Routing SHOULD converge writes for a DB onto one node; correctness is guaranteed by the fence either way.
Pooler (server mode only)
A connection pooler — PgBouncer or pgcat — SHOULD sit in front of engine-server in transaction mode to absorb serverless connection bursts (source §1, §5, §6). This exists only in server mode; the embedded path has no sockets and no pool.
Why transaction mode specifically
Serverless clients open many short-lived connections (one per request/invocation). Each backend connection holds engine resources (a session, cached prepared statements, a snapshot). Transaction mode multiplexes N client connections onto a small pool of backend connections, returning a backend to the pool at each transaction boundary rather than holding it for the client's whole lifetime.
| Pool mode | Backend held for | Fit here |
|---|---|---|
| session | the entire client connection | Defeats the purpose — bursty serverless clients would exhaust backends. |
| transaction | one transaction | target Maximizes reuse under burst; backend returns to pool at COMMIT/ROLLBACK. |
| statement | one statement | Breaks multi-statement transactions; too aggressive for OLTP. |
Transaction-mode constraints clients must respect
In transaction mode a backend is not pinned to a client across transactions, so session-scoped state (session-level SET, server-side prepared statements not re-prepared per txn, advisory session locks, LISTEN/NOTIFY) does not survive between transactions. Clients and PostgREST MUST avoid relying on cross-transaction session state when pooled in transaction mode. This is standard PgBouncer guidance and applies unchanged here.
Config sketch
# pgbouncer.ini (sketch)
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction # the required mode
default_pool_size = 20 # backend conns per (db,user)
min_pool_size = 0 # allow scale-to-zero: no idle backends
reserve_pool_size = 5
max_client_conn = 5000 # absorb the burst on the client edge
server_idle_timeout = 30 # drop idle backends → engine can idle-stop
server_tls_sslmode = require
client_tls_sslmode = require
pgcat is the alternative when sharding/load-balancing across many backend nodes is wanted; it offers the same transaction pooling plus query routing. Either way the pooler is composed, not built (source §6 — "Pooler … No (build from scratch); PgBouncer, pgcat").
- SHOULD deploy a transaction-mode pooler in front of
engine-serverfor any serverless or bursty client population. - SHOULD set
min_pool_size = 0/ a shortserver_idle_timeoutso idle backends drain and the engine can scale to zero (06). - MUST NOT embed pooling logic in
engine-server; keep it a separate composable process. - SHOULD document the transaction-mode session-state caveats for tool authors and for PostgREST configuration.
TLS & authentication
- MUST support TLS on the pgwire listener via the
SSLRequestnegotiation (replyS, then complete the TLS handshake beforeStartupMessage). - MUST support SCRAM-SHA-256 authentication; cleartext/MD5 MAY be offered only over TLS for legacy drivers.
- SHOULD support bearer-token (or mTLS) auth on the HTTP/WS API, mapping to the same principal model as pgwire.
- SHOULD require TLS between the pooler and
engine-serverand between client and pooler (client_tls_sslmode/server_tls_sslmode=require). - MAY delegate user/role storage to the composed auth layer where better-auth or an external IdP owns identity (12); the server only needs to verify a credential and resolve a principal.
Auth is per-connection, authorization is per-statement
The listener authenticates once at startup and resolves a principal; row/table authorization (RLS-style) is the engine's and/or the composed layer's job, not the wire layer's. The wire layer's contract is: prove identity, then forward statements tagged with that principal.
Observability & metrics
Server mode is the only mode with connections, so it owns connection-level observability. Metrics SHOULD be exported in a Prometheus-style scrape endpoint and SHOULD reuse the latency percentiles the benchmark plan demands (p50/p99/p999 — never mean-only; source §8).
- Connection metrics
- Active / idle connections, per-DB writer-lease holder, fencing events, auth failures, TLS handshake failures.
- Latency histograms
- Per-statement and per-commit p50/p99/p999 via HDR-style histograms (source §8 harness), labelled by simple vs extended query and read vs write.
- Engine/lifecycle
- Cold-start count and warm time (feeds idle-timeout / keep-warm tuning, source §10 / Exp 5), cache hit ratio passed through from 05.
- Tracing
- SHOULD propagate a trace/request id from the client through the listener into engine calls so a slow request can be attributed to wire, engine, cache, or storage.
- SHOULD expose a metrics endpoint with connection, latency-percentile, and lifecycle counters.
- SHOULD emit per-statement structured logs with SQLSTATE on failure for debuggability.
- MUST NOT log statement parameter values at default verbosity (they may carry user data); gate behind an explicit debug flag.
Graceful shutdown, drain & backpressure
Drain on shutdown
Because writes ack only after the WAL is durable, shutdown must not strand an in-flight commit. The server drains rather than dropping connections.
SIGTERM received:
1. stop accepting new connections (close listener socket).
2. mark in-flight writes: let each open transaction reach
COMMIT/ROLLBACK or hit drain_timeout.
3. for committed-but-unacked writes: ensure the WAL CAS has
returned and the ack is delivered before closing the conn.
4. flush/handoff the writer lease (so a successor can claim it
without waiting for fence expiry). (06)
5. close idle conns, then close remaining conns at drain_timeout.
6. exit. The engine itself may then idle-stop (scale-to-zero).
Backpressure
The single-writer-per-DB ceiling means the write path is a single lane; under burst, the server must push back rather than unbounded-buffer (buffering would risk acking from memory, which the durability rule forbids).
- MUST drain in-flight transactions on
SIGTERMup todrain_timeoutbefore exiting, and MUST deliver the ack for any write whose WAL became durable during drain. - MUST hand off or release the writer lease on graceful shutdown so a successor engine can take over without a stale-fence stall.
- MUST apply backpressure (bounded queue,
53300 too_many_connections/ a busy error) instead of buffering writes in memory and acking early — the §8 durability rule overrides throughput. - SHOULD bound the per-connection in-flight message window and the global concurrent-write queue depth per DB.
- SHOULD coordinate drain with the pooler (pause new server connections) so clients see graceful reconnects, not reset sockets.
Never ack from memory under load
Backpressure is the safe answer to a write firehose; in-memory buffering with early ack is not. A commit is durable only when its WAL CAS succeeds (source §8, Experiment 4 gate). Shedding load with a clear busy error is correct; pretending a buffered write is committed is disqualifying.
Listener interface (engine-server entry points)
The listener calls the same C ABI the embedded path binds, plus a small surface for the extended-query lifecycle. These are the engine-side entry points the listener marshals to; they are not new engine behaviour.
// Same handle type bun:ffi opens in embedded mode.
fn engine_open(storage_url: &str, db: &str, branch: Option<&str>) -> EngineHandle;
fn engine_close(h: EngineHandle);
// Simple query path.
fn engine_query(h: EngineHandle, sql: &str) -> QueryResult;
// Extended query path (Parse/Bind/Describe/Execute).
fn engine_parse(h: EngineHandle, name: &str, sql: &str,
param_types: &[Oid]) -> Result<(), EngineError>;
fn engine_bind(h: EngineHandle, portal: &str, stmt: &str,
params: &[Value], result_formats: &[Format]) -> Result<(), EngineError>;
fn engine_describe(h: EngineHandle, target: DescribeTarget) -> Description;
fn engine_execute(h: EngineHandle, portal: &str, max_rows: u32) -> ExecOutcome;
// Transaction control surfaced to the wire layer.
fn engine_begin(h: EngineHandle) -> TxnStatus;
fn engine_commit(h: EngineHandle) -> Result<Lsn, EngineError>; // Lsn = durable commit point
fn engine_rollback(h: EngineHandle) -> TxnStatus;
// Out-of-band cancel (maps to pgwire CancelRequest).
fn engine_cancel(h: EngineHandle, secret: u32) -> Result<(), EngineError>;
engine_commit returning the durable Lsn is the bridge between the wire layer and the §8 durability rule: the listener emits CommandComplete / commit_lsn only after this returns, never before.
Configuration
| Knob | Default (proposed) | Effect |
|---|---|---|
listen_addr / listen_port | 0.0.0.0:5432 | pgwire TCP listener bind. |
http_listen | disabled | Enable the alternative HTTP/WS API (edge targets). |
tls_mode | require | TLS requirement on the pgwire listener (disable/allow/require). |
auth_method | scram-sha-256 | Primary auth; cleartext/MD5 only over TLS for legacy drivers. |
server_version | a pg-compatible string | Reported in ParameterStatus; MUST satisfy client probes. |
max_connections | 500 | Per-node connection ceiling before backpressure (53300). |
statement_timeout | 30 s | Abort a runaway statement; emits ErrorResponse. |
idle_in_txn_timeout | 10 s | Reclaim a connection holding a transaction open idle (esp. behind a pooler). |
drain_timeout | 30 s | Max time to finish in-flight transactions on SIGTERM before forced close. |
write_queue_depth | 256 / DB | Bounded in-flight write queue per DB before backpressure; never unbounded. |
cancel_enabled | true | Honor pgwire CancelRequest via BackendKeyData secret. |
The pooler has its own config (sketched above) and is provisioned separately; engine-server does not read it.
Failure modes & edge cases
| Failure | Mechanism | Handling |
|---|---|---|
| Write from a fenced writer | Connection's engine lost the writer lease (another node claimed it). | Engine's CAS append fails (412 at storage, 04); listener emits ErrorResponse SQLSTATE 55P03; the write is NOT acked. Client should reconnect (pooler/router converges to the lease holder). |
| Two nodes cold-start same DB writer | Routing did not converge writes to one node. | Safe by construction: the CAS fence allows only one to commit; the other is fenced at first append. Wasteful, not incorrect; routing SHOULD converge. |
| Client probe query unsupported | Driver issues a catalog/introspection query the subset doesn't answer. | Connect fails or driver misbehaves. Mitigation: enumerate and answer the probe set for Bun.sql/PostgREST/psql (Open questions); return a sensible ErrorResponse for genuinely unsupported SQL. |
| Session state lost across txns | Transaction-mode pooler returned backend to pool mid-session. | Expected pooler behaviour; clients MUST NOT rely on cross-txn session state. Document; PostgREST configured accordingly. |
| SIGKILL (no drain) | Hard kill skips graceful drain. | No acked-write loss: any commit acked was already WAL-durable; any un-acked in-flight commit is replayed or absent on restart (§8 Exp 4). Clients retry; idempotent on commit LSN. |
| Backpressure / overload | Write firehose exceeds the single lane. | Bounded queue then 53300/busy error — shed load, never early-ack from memory. The contended-same-row outlier belongs on coupled Postgres (source §9 / 10). |
| TLS / auth failure | Bad credential or handshake. | ErrorResponse + connection close; counted in metrics; no engine resources allocated. |
| Cold start on first connection | Engine for the routed DB/branch is idle (scale-to-zero). | Listener waits on controller cold-start (process attach + cache warm, source §10); first request pays the warm cost. Tunable via idle-timeout / keep-warm. |
Dependencies & existing pieces to start from
- Wire protocol codec
- pgwire (Rust) for the native engine build, or jackc/pgproto3 (Go) for a Go shim (source §6). Provides message framing; this spec maps messages to engine calls.
- Pooler
- PgBouncer (transaction mode) or pgcat (transaction mode + sharding/routing) — composed in front, not built (source §6).
- Engine library
- The unmodified
libengine/engine.hfrom 02 — Engine Core; the listener is built around it as theengine-serveroutput. - Clients (validation)
Bun.sqland PostgREST as the conformance targets (source §5, §11);psqlandpgbenchfor protocol/perf testing (the 09 harness runs in server mode).- Controller
- 06 — Lifecycle & Controller for cold-start/idle-stop and the writer-lease fencing token the routing depends on.
Acceptance criteria / definition of done
- MUST connect
Bun.sql(new SQL("postgres://…")) with zero custom driver code and run both simple and parameterized queries end to end. - MUST let PostgREST point at the engine, introspect a schema, and serve an auto-generated REST endpoint with no bespoke REST code.
- MUST implement the full extended query path (
Parse/Bind/Describe/Execute/Sync/Close) verified against a stock pg driver andpsql. - MUST reject a second concurrent writer for one DB with a fencing
ErrorResponse, never an acked write, demonstrated with two nodes racing the same DB. - MUST pass a drain test:
SIGTERMduring an in-flight commit delivers the ack iff the WAL became durable, and never strands an acked write (ties to §8 Exp 4). - SHOULD run
pgbench -c Nthrough the pooler in transaction mode and report p50/p99/p999 (feeds 09 Exp 1/2). - SHOULD demonstrate the listener adds no engine-behaviour divergence: identical query results between embedded (
bun:ffi) and server (pgwire) for a fixed test corpus. - SHOULD serve the alternative HTTP/WS API returning identical results and SQLSTATE codes to pgwire for the same queries.
Open questions & risks
- Exact introspection probe set. Enumerate the precise catalog/
pg_catalogqueriesBun.sql, PostgREST, andpsqlissue on connect, and decide which are answered for real vs. stubbed. This is the make-or-break detail for "wire-compatible." - server_version value. Which version string keeps all target clients happy without claiming features the subset lacks? Pin and test per client version.
- Writer-lease handoff under the pooler. In transaction mode the backend is shared; how does a write transaction acquire/verify the writer lease per transaction without per-statement fence round-trips?
- Cancel semantics. How does
CancelRequestmap onto an engine running a single-writer commit — can a commit mid-CAS be safely cancelled, or only pre-CAS statements? - Routing convergence. Proxy-to-lease-holder vs. redirect vs. let-the-fence-sort-it-out — which keeps writes on one node cheaply without a central directory becoming a bottleneck?
- HTTP/WS transaction handles. Lifetime, idempotency, and timeout of an opaque
txnhandle across stateless HTTP requests, especially at the edge. - WASM listener form. On Workers there is no raw TCP and the engine is WASM (source §10); the HTTP/WS API is likely the only viable front door there — confirm pgwire is simply absent on that target (11).