Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save jlevy/3652ea46e2ba78833c5cd331a06a3683 to your computer and use it in GitHub Desktop.

Select an option

Save jlevy/3652ea46e2ba78833c5cd331a06a3683 to your computer and use it in GitHub Desktop.
Portable, server-free, concurrent SQLite — research doc (snaplog)
title Portable, Server-Free, Concurrent SQLite for Local Applications
description A domain-agnostic reference on building a locally-friendly, portable, thread-safe, NFS-safe, data-loss-safe usage pattern (package/wrapper/adapter) around SQLite — covering the locking/journaling/filesystem background, the session/changeset extension, source-grounded node:sqlite findings, the prior-art landscape, and a validated snapshot-publish and transaction-envelope design with a working prototype.
author Claude Code agent (finterm), with prior survey work by Joshua Levy and LLM agents

Research: Portable, Server-Free, Concurrent SQLite for Local Applications

Date: 2026-06-06 (last updated 2026-06-06)

Author: Claude Code agent, building on the finterm embeddable-databases survey and a validated standalone spike

Status: Complete (factual survey and a tentative design; prototype validated)

Overview

A domain-agnostic reference on one question:

How do you make a SQLite database safe and convenient to use concurrently, from local applications, with no database server, on any filesystem (including NFS and consumer sync folders), while keeping the file a portable, durable, tool- compatible artifact and never losing more than the current transaction?

SQLite is the obvious archival/interchange format — a single public-domain file readable by nearly every language and tool. But its built-in concurrency story (one writer at a time; WAL needing shared memory; fcntl/flock locks unreliable on network mounts) is a poor fit for “many local agents and processes write one file, sometimes on a network drive, with no daemon.” The gap between “great file format” and “safe concurrent local access with no server” is exactly what this research maps, and it ends with a concrete, prototype-validated design for closing it as a generic wrapper around SQLite.

The motivating use cases are deliberately kept generic — durable local agent state, local data tools, offline-first caches, content-addressed metadata catalogs, build/CI artifacts, “zip-and-go” datasets. (finterm’s own dataroom catalog is one such consumer, but nothing here is specific to it; the design takes a caller-supplied schema and merge policy and knows nothing about the domain.)

This builds directly on a prior finterm survey, Modern Embeddable / File-Based Databases (§G–§J there), and on a from-scratch standalone spike with 55 passing tests and source-grounded checkouts of SQLite and Node. It restates the relevant parts so it stands alone.

What Kind of Artifact This Is

A framing question worth settling up front, because it shapes what the deliverable actually is (and is more fundamental than any product name): if we build something here, what category is it? In short, primarily a storage format and commit protocol — a set of conventions over standard SQLite files — not a wrapper or a new engine.

The accurate answer: primarily a storage format and commit protocol — a set of conventions over standard SQLite files — with per-language reference libraries. Concretely, the durable core is (a) an on-disk layout (immutable versioned snapshots, a current pointer, changeset transaction envelopes, an applied_tx ledger, and a mkdir lease) and (b) the protocol for writing, reconciling, and reading it. The library code is one implementation of that protocol, not the thing itself. The closest precedents are Apache Iceberg and Delta Lake: table formats and commit protocols layered over Parquet files, with multiple engine implementations — nobody calls them “Parquet wrappers.” This is the same shape, over SQLite.

Why the alternative framings mislead:

  • “A wrapper around SQLite” undersells and misdescribes it. A wrapper implies intercepting or replacing the SQLite API (a custom VFS, or a drop-in connection object). This design never touches the SQL surface — you still query with plain SQLite — it orchestrates files around SQLite. It is an overlay/coordination layer, not an API wrapper.
  • “A complete repackaging as a file-based database that uses SQLite” oversells it. There is no new query language and no new file format; the published snapshots are ordinary SQLite. To a consumer it behaves like a small embeddable multi-writer database, but it is not a new engine.
  • “A set of formats and conventions” is the core truth — that is the durable, standardize-and-version deliverable; the libraries follow from it.

Is it Node-specific? No. The format and the entire reader path are language-agnostic: a reader just opens the standard SQLite file named by current, needing zero special support in any language or tool. Even the writer’s changeset envelope is a SQLite-defined binary blob that any session-capable binding can produce or consume. Only the writer/reconcile role needs such a binding — Node node:sqlite is the natural first reference implementation (it ships sessions with zero native deps; Finding G2), with Python (APSW) and Rust (rusqlite) able to implement the same format later. So: a language-agnostic format and reader story, with writer libraries wherever a session-capable binding exists, Node first.

The practical consequence for the rest of this document: the format/protocol is the primary artifact (its layout, manifest schema, application_id/user_version stamping, and reconcile/conflict rules are what must be precise, stable, and versioned), and any package — in any language — is a reference implementation of it.

SQLite Sessions: A Mature but Underused Primitive

Almost every approach in this document leans on one specific SQLite feature — the session/changeset extension — so it is worth foregrounding what it is, why you have probably never used it, and why it is the key that makes the rest possible. (The mechanism in detail is Finding F; this is the orientation.)

What it is. The session extension lets a connection record the exact row-level changes (INSERT/UPDATE/DELETE) made during a unit of work and serialize them into a changeset — an ordinary binary blob you can write to a file. A changeset can later be applied to another database, with a per-change conflict handler deciding OMIT/REPLACE/ABORT. In effect it is logical, row-level replication built into SQLite, where the unit of replication is just a file.

History. The extension has been in SQLite since version 3.13.0 (2016) — it is not new or experimental at the engine level; it is a decade-old, stable part of the codebase maintained by the core SQLite team, with a documented C API (sessionintro) and an accompanying rebaser (for merging divergent changesets) and changegroup (for concatenating them). It was built for the natural “session” use case: capture what changed here, carry it elsewhere, merge it — i.e. database sync.

Why you have probably never used it. It is disabled by default. The feature only compiles in when SQLite is built with -DSQLITE_ENABLE_SESSION -DSQLITE_ENABLE_PREUPDATE_HOOK, and most distributions and bindings ship without those flags — confirmed by source in this pass: macOS system SQLite, the default better-sqlite3 build (deps/defines.gypi), Python’s stdlib sqlite3, and Go’s mattn/go-sqlite3 all omit it. So for years the feature was present in the amalgamation but invisible at the application layer. Compounding this, the famous “distributed SQLite” projects chose other mechanisms — LiteFS and Litestream replicate the WAL byte stream; rqlite and dqlite use Raft over SQL statements — so the changeset (logical, row-level) lane stayed in the mobile / offline-first sync corner (e.g. SQLiteChangesetSync, sqlite-sync, Tencent WCDB, Bloomberg comdb2), out of the server-infrastructure conversation. The net effect: a powerful, mature primitive that very few people reach for.

Why it is newly relevant — and promising. Its visibility is rising precisely because a first-class runtime finally exposes it by default: Node’s built-in node:sqlite ships with SESSION compiled in (confirmed: node/deps/sqlite/sqlite.gyp:24,27), and Python’s APSW and Rust’s rusqlite (session feature) expose it too. That matters here because a changeset is exactly the file-native, server-free, optimistic commit-log this problem needs: writers each emit a uniquely-named changeset file (conflict-free even on NFS), a merge step folds them with a domain conflict policy, and — the key property — readers never need the extension at all, since they just open the resulting standard SQLite file. In other words, the thing that makes “concurrent, server-free, NFS-safe, per-session-optimistic SQLite” tractable has been sitting in the engine for ~10 years; what changed is that it is finally reachable from a zero-dependency local runtime. The rest of this document is, in large part, an argument that this underused primitive is the right foundation — and a check of where its binding-level rough edges (Findings G/G2) constrain a v1.

Requirements

The target properties (the “contours”), restated as testable requirements:

  • R-FORMAT — durable, portable, tool-compatible archival format. The at-rest artifact is a standard SQLite file (or a directory of plain files and standard SQLite files) that any SQLite tool/binding/language can open, today and in years. No engine-specific or proprietary container as the interchange surface.
  • R-LOCAL — direct local access via in-process APIs. First-class Node, with a path to Python and Rust (and the sqlite3 CLI) for at least the reader role. No network client required to read or write.
  • R-XPLAT — cross-platform. macOS, Linux, Windows; common CPU architectures.
  • R-NOSERVER — no traditional server/daemon. Any process just opens files. A minimal, ephemeral, auto-managed coordinator is acceptable only as an optional optimization, never required — cross-platform daemon lifecycle management is itself a liability.
  • R-FS — filesystem-agnostic, including NFS/SMB/FUSE/sync. Correct where POSIX flock/fcntl byte-range locks and WAL shared memory do not work.
  • R-DURABLE — bounded loss and hot-copy-safe. A crash, an interrupted write, or a file copy taken while the DB is live loses at most the current/in-flight transaction — never a previously-committed state, never the whole dataset. A repair path exists for anomalies (stale locks, recovering a hot copy).
  • R-CONCURRENCY — wrapper-managed, session/optimistic, not one-big-lock. A thin wrapper API may manage concurrency, but the goal is optimistic, session-/transaction-grained concurrency (worst-case loss = one session), not a single coarse whole-DB mutex held across work.

Requirement review (refinements and one addition). These are sound and mutually achievable, with three notes:

  1. R-CONCURRENCY “optimistic, per-session” and R-NOSERVER together force a commit-log/MVCC-style design, not a lock-wrapper. A pure wrapper-lock (even a good one) is still one-writer-at-a-time; true per-session optimism on a single file with no server is achieved by logical row-level changesets merged optimistically, not by finer file locks (see Findings B). So “wrapper-based locking” is best read as “wrapper-managed concurrency,” whose best form is a changeset commit-log.
  2. R-FS and R-CONCURRENCY mean the live multi-writer working directory is a directory of files, even though the archival artifact is a single SQLite file. These are not in tension: the live form has a tail of pending change-files that a finalize/seal step folds into one clean file. Worth stating explicitly as part of R-FORMAT.
  3. Add R-DETECT — no silent corruption. Implied by R-DURABLE but worth elevating: every inconsistency (torn copy, partial write, bit-rot, dangling reference) must be detectable and surfaced loudly, not silently accepted. Content hashing, integrity checks, and structural “in-flight tells” make this cheap.

Questions to Answer

  1. Is SQLite the right at-rest format, and what makes a SQLite file portable/archival?
  2. What exactly breaks when you use SQLite concurrently, and on NFS/SMB/sync folders?
  3. Why is there “no magic file lock,” and what does that imply for the design?
  4. What does the existing landscape (LiteFS, rqlite, dqlite, mvSQLite, Litestream, libSQL/Turso, CRDT layers, mobile sync) actually do, and what’s the gap?
  5. How does SQLite’s session/changeset extension work, and is it the right primitive?
  6. What are the concrete binding constraints (Node node:sqlite vs better-sqlite3 vs bun:sqlite; Python; Rust)?
  7. What design satisfies all the requirements, and does a prototype bear it out?

Scope

In scope: in-process/local concurrent access to a single logical SQLite store with no required server; locking/journaling/filesystem behavior; the session/changeset extension; the snapshot/commit-log pattern; crash/durability/hot-copy safety; cross-language reader portability; a generic wrapper design and prototype.

Out of scope: analytical/columnar engines (DuckDB/Parquet) and full-text/vector search (covered in the prior survey); choosing a specific application schema; remote sync/replication and object-storage transport (a separable layer); and any single application’s domain model.

Findings

A. SQLite is the right at-rest format; what makes it portable

SQLite is the strongest database archival format in wide use: a single file, public domain, a documented and decades-stable on-disk format, a US Library of Congress recommended storage format, and readable by ubiquitous tooling (sqlite3 CLI, Datasette, every major language’s stdlib or a maintained binding). It also bundles FTS5 and JSON1, so metadata, key/value (a trivial kv(key PRIMARY KEY, value) table), and lexical search can live in one inspectable file. The on-disk format is cross-platform and endianness-independent.

Portability is mostly a discipline, not luck:

  • Store room-relative, slash-separated UTF-8 paths in any path columns; never absolute or platform-specific paths (a real bug observed in qmd, whose init recorded absolute paths).
  • Keep the authoritative tables on core SQLite features only. FTS5 is broadly available; loadable extensions (e.g. sqlite-vec’s vec0 virtual table), custom collations, and compile-time options are not guaranteed. A database that embeds a vec0 table is no longer openable by a stock reader (touching the table throws no such module: vec0). Vectors/large indexes belong in separate, rebuildable files.
  • macOS system SQLite ships SQLITE_OMIT_LOAD_EXTENSION, so never assume extension loading — probe it.
  • A packaged/sealed file must be finalized (no -wal/-shm side files) via checkpoint, the backup API, or VACUUM INTO, and pass PRAGMA integrity_check.
  • Stamp PRAGMA application_id and PRAGMA user_version so foreign readers can recognize and version-gate the file.

Conclusion: keep the content/bytes and the interchange artifact as standard SQLite (plus plain files for large blobs); never hand an outside consumer an engine-specific KV store (LMDB/RocksDB/redb) — those are fine only as a rebuildable local cache.

B. “Locking” is three different things; the fine-grained part is never the filesystem

Most confusion comes from one word spanning three independent layers:

  1. File/OS ownershipwhich process may write the file: flock / fcntl / LockFileEx / mkdir / dotfile. Coarse, whole-file.
  2. In-engine concurrency controlhow concurrent transactions isolate: MVCC, two-phase locking, a lock manager with table/page/row granularity. This is the “per-row / per-section” part.
  3. Durability / commit — WAL / rollback journal.

The key realization: layer 2 — the fine-grained part — is an in-memory data structure owned by one process. It is not, and never has been, a filesystem feature. No database does per-row locking by taking a filesystem lock per row. Every engine that allows concurrent local writers uses a shared-memory lock table (Postgres' lock table in shared memory; InnoDB’s in the buffer pool; Berkeley DB’s lock subsystem in a shared-memory environment; LMDB’s reader table and write mutex in lock.mdb; SQLite’s WAL-index in -shm). Shared memory is same-host only — it does not cross machines.

This yields the cross-host theorem: fine-grained locking requires a per-lock acquisition; across hosts “acquire lock” is a network round-trip, and a service that answers network lock requests is a database server. Therefore you cannot have fine-grained, cross-host locking without a server. Across hosts you stop holding locks and turn the lock into a commit — an atomic-create/lease to elect a writer, or a compare-and-swap pointer-swap (optimistic concurrency) to commit a change — held for milliseconds, not across the work.

Better-than-flock local primitives exist (mmap atomic-CAS slots; Linux robust futexes/mutexes with EOWNERDEAD recovery; Windows named mutexes with WAIT_ABANDONED; OFD byte-range locks F_OFD_SETLK) but they all inherit the network wall, and robust mutexes have a macOS gap (historically unimplemented), so a portable shared-memory locker can’t rely on them. The honest consequence for our requirements: don’t look for a cleverer file lock; put the serialization point in the right place — a millisecond commit/election, made correct by idempotency, not by lock perfection.

C. SQLite’s own locking and journaling, precisely

  • Default (rollback journal) mode uses fcntl byte-range locks at fixed offsets ("lock bytes") plus a -journal file for crash atomicity. One writer at a time; readers blocked during the brief write commit.
  • WAL mode adds a -wal file and a -shm shared-memory wal-index, giving concurrent readers and one writer with much less blocking. Per SQLite’s own WAL docs, the -shm wal-index does not work on a network filesystem, so WAL is local-disk-only.
  • busy_timeout + BEGIN IMMEDIATE + retry lets multiple local processes write one WAL DB safely with brief contention. On local disk this already mostly “just works.”
  • URI flags: nolock=1 disables SQLite’s file locking entirely (safe only when you guarantee a single writer / own an external mutex); immutable=1 opens a file promised not to change (fast, safe reads of a finalized snapshot).
  • Never treat copying an open .db as a backup. Use the online backup API or VACUUM INTO, then integrity_check/quick_check. A naive cp/rsync/tar/sync of a live WAL DB can capture a torn page state and is oblivious to -wal/-shm — the exact reason replication tools (Litestream) exist instead of file copy.

D. Why NFS/SMB/FUSE/sync folders break SQLite — a proven pitfall

Two layers fail on network/sync storage: WAL’s -shm shared memory does not exist there, and fcntl/flock byte-range locks are unreliable (notably flaky for flock-on-NFS on macOS). Consumer cloud-sync folders (Dropbox, iCloud Drive, OneDrive, Google Drive) are the worst case: the sync client copies the .sqlite mid-write and ignores -wal/-shm, corrupting it.

This is not hypothetical. The beads issue tracker (bd) shipped a SQLite backend that was effectively unusable on Claude Code Cloud agent workspaces because they run on network drives; the project removed SQLite entirely, moved its working store to Dolt (still “single-writer only, file locking enforced”) with an optional server for sharing, and kept a plain git-committed .jsonl as the portable interchange surface. Its successor (tbd) went fully file/git-native. The lesson: a live, network-mounted SQLite file as a shared multi-writer store is the specific thing to avoid. The safe patterns are (a) a plain file-native interchange layer as the portable truth and (b) confining any live single-writer DB to local disk or a coordinator — never a shared network-drive DB file.

The one primitive that is reliable across local disk and NFS is atomic directory create (mkdir(2)) and atomic rename — which is why the git/npm world uses mkdir lock dirs rather than flock. The design leans entirely on these.

E. The prior-art landscape — and the gap

Existing “distributed/concurrent SQLite” efforts cluster into a few shapes, none of which is a thin, packaged, portable wrapper for the local+NFS+no-server case:

Project / approach What it is Where it sits
WAL + busy_timeout + retry SQLite’s built-in local concurrency local-disk only; no NFS; one writer
libSQL BEGIN CONCURRENT MVCC/OCC multi-writer, still a SQLite file same -shm limit; experimental; local
LiteFS (Fly.io) FUSE FS replicating SQLite; lease-based single primary ephemeral elected coordinator (a server, productized)
rqlite / dqlite HTTP/Raft (rqlite) or embeddable C Raft (dqlite) over SQLite long-lived consensus server
mvSQLite SQLite storage swapped onto FoundationDB; distributed MVCC replace the storage layer (a cluster)
Litestream single local writer + streaming WAL replication to object store durability/replication, not concurrency
cr-sqlite (Vlcn) CRDT extension on SQLite dormant; a merge layer, not a store
Mobile changeset sync (SQLiteChangesetSync, sqlite-sync, WCDB) offline-first device sync via the session extension the closest analogue — but app/device-specific
DIY proper-lockfile + write queue a generic cross-process lock around a SQLite file the thin end; folk knowledge, not a marquee package

The serious multi-node efforts all centralize the writer (LiteFS lease, rqlite/dqlite Raft) or replace the storage layer (mvSQLite). The thin “external lock around a shared/NFS SQLite file” is DIY folk knowledge — there is no marquee packaged library, partly because the SQLite project actively discourages SQLite-on-NFS. The genuinely file-native row-level mechanism (the session/changeset extension) lived in the mobile-offline-sync corner, out of the “distributed SQLite” spotlight, because it was compile-time-disabled in most builds — until node:sqlite exposed it first-class.

The gap: a small, dependency-light, cross-platform package that gives optimistic, session-grained, no-server, NFS-safe, hot-copy-safe concurrent access while keeping the artifact a standard SQLite file. That is the thing worth building.

F. The SQLite session / changeset extension — the file-native primitive

The session extension (in SQLite since 3.13.0, behind -DSQLITE_ENABLE_SESSION -DSQLITE_ENABLE_PREUPDATE_HOOK) records the row changes a connection makes and packages them into a changeset (or smaller patchset) — just a binary blob / file:

  • sqlite3session_create + sqlite3session_attach(NULL) → record all tables.
  • run normal INSERT/UPDATE/DELETE.
  • sqlite3session_changeset → extract the changes as a blob.
  • sqlite3changeset_apply(target, blob, xFilter, xConflict, …) → apply to another DB, with a filter (per-table include/exclude) and a conflict handler returning OMIT / REPLACE / ABORT per change.
  • sqlite3changegroup / changeset_concat combine; changeset_invert undoes; the rebaser (sqlite3rebaser_*, EXPERIMENTAL) rebases one writer’s changeset onto another’s already-resolved conflicts.

Conflict types the handler sees: DATA (target row changed out from under the writer), NOTFOUND (no row with that PK), CONFLICT (PK/UNIQUE violation on insert), CONSTRAINT, FOREIGN_KEY. This is where merge policy lives (e.g. cache = last-writer- wins by timestamp; edges = set-union; events = append).

Changeset vs patchset: a changeset carries old+new values, so it can detect DATA conflicts; a patchset is smaller (PK and new values) but cannot. Measured (update 1 of 4 columns × 500 rows): changeset 15009 B vs patchset 10009 B (~33% smaller). For a metadata store where big bytes live as files (so changesets are already small), the conflict detection is worth more than the size — use changesets.

Documented limitations, and why they fit a clean design:

  • Every table needs a non-null PRIMARY KEY — rows with NULL PK are silently ignored. (So: declare explicit PKs everywhere.)
  • Virtual tables are not captured (FTS5/sqlite-vec). (So: those are rebuildable projections, excluded from the merge and rebuilt after — exactly where they belong.)
  • DDL/schema changes are not captured. (So: migrations are a separate deterministic step; all writers must share a schema version.)
  • Big values are embedded in the changeset. (So: keep large bytes as files; catalog rows stay small and so do changesets.)

Binding availability (the practical crux): because a changeset is just a file, readers never need the extension — they open the SQLite file normally with any binding. Only the writer (to record) and the merge step (to apply) need it:

  • Node: built-in node:sqlite has it (verified). better-sqlite3’s bundled build does not enable SQLITE_ENABLE_SESSION; bun:sqlite lacks the session API and Bun does not expose node:sqlite.
  • Python: stdlib sqlite3 lacks it; APSW has full session support.
  • Rust: rusqlite exposes it behind its session feature.

G. Source-grounded node:sqlite findings (the binding, not SQLite, is the limit)

Verified empirically and confirmed against checkouts of sqlite/sqlite (002d33d, ext/session/sqlite3session.h) and nodejs/node @ v22.22.2 (src/node_sqlite.cc):

  1. node:sqlite’s conflict handler receives only the integer conflict type — not the table name or row values. The C xConflict(pCtx, eConflict, pIter) has the iterator pIter (from which sqlite3changeset_op yields the table name), but the binding passes JS only eConflict (node_sqlite.cc: conflictCallback(eConflict)). So per-table policy cannot be routed inside the conflict handler — you must use the applyChangeset filter(tableName) callback, which does receive the table name, running one filtered apply pass per policy action. (This is a binding limitation; an upstream feature request to forward the iterator/table name is the clean fix.)
  2. SQLITE_CHANGESET_ABORT works and node:sqlite signals it as a false return value, not a throw. Per the C docs, ABORT “rolls back any changes applied so far and returns SQLITE_ABORT”; the binding maps that to applyChangeset(...) === false (node_sqlite.cc: “this is not an error, return false”). Verified: a two-insert changeset where one row conflicts, under ABORT, returns false and rolls back both; under OMIT, returns true and keeps the non-conflicting row. So a strict/quarantine policy is detected by the boolean, then the whole transaction is rolled back.
  3. REPLACE is only legal for DATA/CONFLICT conflicts — returning it for NOTFOUND/CONSTRAINT/FOREIGN_KEY is SQLITE_MISUSE (a throw). So last-writer-wins must be conflict-type-aware (REPLACE on DATA/CONFLICT, OMIT otherwise). (A latent bug caught by reading the header; covered by a NOTFOUND test.)
  4. No sqlite3_rebaser in the binding (the C API has it; node_sqlite.cc does not expose it). So a v1 design must avoid needing a rebaser.
  5. The session API works on the repo’s Node v22 baseline (v22.22.2), though node:sqlite is still behind --experimental-sqlite, so production use should gate on it leaving experimental (or a native fallback).

All five are pinned to C source in the prototype’s SOURCES.md (SQLite ext/session/sqlite3session.c:4952-4978 for the conflict dispatch and MISUSE rules; sqlite3session.h:192-202 for the PK/NULL-PK rules and :1592-1622 for the rebaser; src/wal.c:129-133 for “WAL is not supported on a network filesystem”; src/os_unix.c:1191 “POSIX advisory locks are broken by design”; and node/src/node_sqlite.cc for the binding behavior, with node/deps/sqlite/sqlite.gyp:24,27 confirming SESSION/PREUPDATE are compiled in).

G2. Bindings compared, and the lock prior art (source-grounded)

A second source pass checked the two other obvious building blocks:

better-sqlite3 (v12.10.0, bundles SQLite 3.53.2). Its actual compile flags (deps/defines.gypi) enable FTS5/JSON1/DESERIALIZE/THREADSAFE=2/USE_URI=0 but neither SQLITE_ENABLE_SESSION nor SQLITE_ENABLE_PREUPDATE_HOOK — the session code is #if-compiled-out — and it exposes no session/changeset/rebaser JS API (only prepare/transaction/pragma/backup/serialize/function/aggregate/loadExtension). It compiles the standard amalgamation, so its on-disk file is the identical SQLite format 3, cross-readable with node:sqlite/CLI/Python/Rust. Verdict: not usable as the changeset writer/reconcile runtime without a custom native rebuild (re-introducing the prebuild burden node:sqlite avoids), but fully usable as a reader and as the direct-mode writer (which needs no session), and a good reference for the driver API shape. A CatalogDriver abstraction should require node:sqlite only for the changeset path and allow either binding elsewhere.

node-proper-lockfile (v4.1.2). It locks with mkdir ("atomic on any kind of file system, even network based ones") and explicitly avoids O_EXCL/flock because “O_EXCL is broken on NFS”; it uses an mtime lease and heartbeat with a compromised threshold and mtime-precision probing for coarse-granularity filesystems. This is independent confirmation that mkdir, mtime-lease, and heartbeat is the correct, battle-tested pattern — the exact shape of the spike’s lock. Verdict: don’t depend on it directly (it is async/Promise-based while the publish path is synchronous, pulls a graceful-fs dep, and has no owner-token fencing — only an onCompromised notification), but adopt its mtime-precision handling and default constants (stale ≥ 5000, update = stale/2) on top of the spike’s owner-token revalidate-before-publish fencing. Crucially, proper-lockfile alone does not make a live SQLite DB NFS-safe — it only protects the lock — which is why the design still needs the immutable-snapshot + idempotent-ledger model so correctness does not depend on the lock at all.

H. The design pattern that satisfies the requirements: snapshot-publish and envelopes

Combining the above, the pattern that meets every requirement with no server and no fragile file lock is optimistic concurrency over an append-only changeset commit-log, with immutable published snapshots selected by an atomic pointer. On disk:

<dir>/
  current                         # tiny text pointer → the published snapshot version
  snapshots/
    000000000000.sqlite           # immutable, versioned published heads (standard SQLite)
    000000000001.sqlite
  tx/
    pending/<txid>.txn/           # one transaction envelope per writer commit
      manifest.json               #   tx_id, writer_id, base_version, schema_version+hash,
                                  #   changeset_sha256, logical_clock
      changeset                   #   the SQLite changeset blob
      COMMITTED                   #   marker written LAST → the tx becomes visible atomically
    applied/<txid>.txn/           # archived after folding into a published snapshot
    quarantine/<txid>.txn/        # corrupt / schema-mismatch / strict-conflict (evidence kept)
  leases/<token>.lease            # read-leases pinning a version against GC
  publish.lock/                   # mkdir lease lock (owner token); held only during publish

Protocol:

  • Write (lock-free). Seed a session from the current immutable snapshot (safe — it is never mutated), run the caller’s work, extract a changeset, and write an envelope with the COMMITTED marker last. Unique paths ⇒ conflict-free concurrent writes on local disk and NFS; partial/uncommitted envelopes are ignored.
  • Reconcile (out-of-band). Build the next snapshot in a temp file from the current snapshot and committed pending envelopes (skipping any already in the snapshot’s applied_tx ledger; quarantining digest/schema-mismatched or strict-conflicting ones), run quick_check, then publish by atomically swapping current (write current.tmp, rename). Building is lock-free; only the short publish (re-check pointer and rename) is fenced by the mkdir lease lock, with optimistic concurrency on current.
  • Read. Open only the immutable snapshot named by current — any binding, any language, no session extension, and (since it never changes) safe to immutable=1.

Why this satisfies each requirement:

  • R-DURABLE / hot-copy-safe / R-DETECT. Published heads are immutable and the pointer swap is atomic, so a crash, an interrupted write, or a file copy taken while live can never damage a committed state — the worst case rolls back to the last quick_check-validated current, losing at most the uncommitted tail (≤ one session per writer). A naive hot copy captures, at worst, some extra pending envelopes (harmless — ignored unless COMMITTED) and a consistent older snapshot. Content digests, integrity_check, and structural “in-flight tells” make every anomaly detectable.
  • R-CONCURRENCY. Writers never block each other (unique envelopes); only genuinely conflicting transactions are resolved by policy at reconcile (last-writer-wins / set- union / strict-quarantine). This is per-session optimism, not a whole-DB lock.
  • R-FS / R-NOSERVER. The only shared mutations are atomic file/dir creates and one atomic rename — reliable on local disk and NFS. No WAL -shm, no fcntl, no daemon.
  • R-FORMAT / R-LOCAL / R-XPLAT. Every published snapshot is a standard SQLite file; readers are universal; node:sqlite is zero-dependency so install is trivial cross- platform.

Crucially, correctness does not depend on the lock being perfect. The mkdir lease is belt-and-suspenders: a per-snapshot applied_tx ledger (written in the same transaction that applies each changeset) makes folding exactly-once, and two overlapping reconcilers build separate temp files and commit via optimistic concurrency on current (the loser retries to a no-op). A stale-lock break or a process killed mid-publish therefore cannot corrupt or double-apply. This is what makes a portable design feasible where a perfect cross-platform lock is not.

I. The “easy local” variant (same format, simpler, any binding)

For the common single-writer-on-local-disk case, the same immutable-snapshot format supports a direct strategy: one writer holds the publish lock and does copy → mutate → publish synchronously — no changesets, no envelopes, no reconcile, and no session extension even for the writer, so it runs on any SQLite binding (better-sqlite3, bun:sqlite). It is a coarse single-writer serializer (the lock is held across the short write), but still immutable-snapshot-safe (R-DURABLE) and NFS-safe (mkdir and atomic rename). Because it writes the identical on-disk shape, a room can be written directly when alone and read/operated by the concurrent engine later — two write strategies, one format. This is the “legacy/simple, easier-locking-locally” option alongside the full concurrent engine.

J. Experiments run, and results

A from-scratch, dependency-free spike (node:sqlite, Node v22.22.2) implements the design as a generic package (caller supplies schema and per-table policy) and is exercised by 55 passing tests, including real multi-process (child_process) concurrency:

  • Concurrency: N lock-free writer processes lose zero committed tx after reconcile; racing reconciler processes publish exactly-once with an integrity_check-clean head; readers looping during concurrent publishes never see a torn or regressing snapshot.
  • Crash recovery at every publish step (after-build, after-snapshot, after-pointer) recovers to exactly-once; a published snapshot is never lost.
  • True SIGKILL: a child killed while holding publish.lock does not wedge the store (the stale lease is reclaimed; recovery is exactly-once); a reconciler killed mid-build orphans only a temp.
  • Randomized chaos: writers, reconcilers, and readers all at once converge with zero loss, no double-apply, integrity-clean head (stable across repeated runs).
  • Fencing/lease: revalidate-before-publish aborts a holder whose lease was stolen; a stale lock is reclaimable; OCC on the pointer turns a superseded-base reconcile into a safe no-op.
  • GC and read-lease: a snapshot pinned by an active lease survives GC; an abandoned lease self-clears; a packager reads a byte-stable pinned snapshot while current advances.
  • Portability: a published/sealed head begins with SQLite format 3\0 and carries application_id/user_version; opens read-only with no session API.
  • doctor/repair: restores the last consistent point from a torn room (re-publish to the highest integrity-clean snapshot, roll back orphans, quarantine uncommitted envelopes, clear transients and the stale lock).
  • Direct mode and interop: synchronous direct writes; a directly-written room is read by the concurrent engine and vice versa; N direct-writer processes serialize with zero loss.

Performance (indicative, small metadata rows on local tmpfs):

  • Write (envelope, default snapshot-seed): ~4–5 ms/write at a few-hundred-row catalog, growing with catalog size because the writer copies the current snapshot to seed its session.
  • Seeding crossover (2 KB rows): snapshot-seed 10.5 → 28.6 ms/write as the catalog grows 500 → 5500 rows, while an opt-in schema-only seed (build an empty schema-only working DB, valid for inserts of new keys) stays flat at ~16 ms/write. So: default to snapshot-seed when small/for any mutation; switch insert-heavy writers to schema-only once the catalog is large.
  • Reconcile: ~6 ms/tx (one base copy per pass + per-tx apply + ledger insert + quick_check).

K. Read/write concurrency and isolation — exact guarantees

Exactly what each model offers, and where the snapshot-publish design lands.

WAL (the fair baseline). WAL is the strong-isolation incumbent: many readers run concurrently and lock-free (each sees a consistent snapshot as of its transaction start — snapshot isolation), and a reader never blocks the writer or vice versa. But there is exactly one writer at a time, and the write lock is whole-database — a second writer gets SQLITE_BUSY and must wait/retry. Isolation is serializable for writers (each write transaction sees the latest committed state and is fully isolated). The price: it needs shared memory (-shm), so it is local-disk only, and a live file copy is not safe (R-FS and R-DURABLE both fail). So WAL = best isolation, worst portability, single writer.

The snapshot-publish design, stated precisely. A “write” has two phases with very different concurrency:

  • Reads: unlimited and never blocked. Every reader opens an immutable published snapshot (optionally immutable=1), so any number of readers across any processes/hosts proceed in parallel and are never blocked by writers, by reconcile, or by each other. Isolation is snapshot isolation at the granularity of a published version — a reader sees a single consistent committed snapshot N. (Same guarantee class as WAL readers, and strictly safer because readers never touch a live/mutating file.) The one difference from WAL: a just- committed write is not visible until reconcile publishes it (a visibility lag, not an isolation violation).
  • Write capture: unlimited and lock-free. Any number of writers, in any processes, on any hosts, record their changes and drop envelopes simultaneously with no write lock. This is more write concurrency than WAL, not less — there is no global write lock on the application path.
  • The only serialized step is the out-of-band merge (reconcile), fenced by the mkdir lease and OCC on the pointer. It is brief, runs on its own cadence, and writers and readers never block on it. It serializes the production of a new published snapshot, not the acceptance of independent writes. (And even two merges racing are safe — OCC; the loser retries to a no-op.)

Conflict granularity is per-row, not whole-DB and not even per-table. Because merging is done with changesets, two writers that touch different rows never conflict — even in the same table — and both land with zero loss. Two writers that touch different tables never conflict. A conflict arises only when two writers touch the same primary key, and it is resolved by that table’s policy: lww (last-writer-wins by deterministic order), union (keep first / drop duplicate), or strict (quarantine the losing tx). Changesets carry old+new values, so the engine also detects a genuine DATA conflict — two writers that each modified the same existing row from the same base — not just insert collisions. So the write-concurrency granularity the requirements asked for ("different tables/sections instead of a global write lock") is already exceeded: it is row-level.

Isolation: honestly weaker than serializable. The design is not serializable across writers. It tracks each transaction’s write set (the rows it changed), not its read set. So it detects write-write conflicts on the same row, but it cannot detect a read-write anti-dependency — e.g. writer A reads row X and writes row Y accordingly while writer B changes X; both commit and the X→Y invariant can be violated. WAL’s single-writer serializability does not have this hole. The honest framing: the design provides “row-level optimistic concurrency with write-set conflict detection and read-your-own-writes within a session”, not full serializable isolation. Applications that need a cross-row invariant must put those rows under a strict policy and use retry (below), or accept last-writer-wins semantics, or keep that invariant in a single-writer (DirectStore) lane.

Optimistic concurrency and granular retry — yes, at the row/transaction level. The model is optimistic at row granularity (writers never lock; conflicts are detected at merge by PK and old-value). Granular retry is available in three escalating forms:

  1. Policy resolution (no retry needed): for lww/union tables the merge deterministically resolves same-row conflicts — the common case needs no retry at all.
  2. Application-level OCC retry (recommended for strict): on a quarantined/aborted tx, a writeWithRetry(fn) helper re-seeds from the new current snapshot and re-runs the caller’s fn against fresh data, up to N times — a true per-transaction optimistic retry loop (the classic compare-and-retry). This is straightforward to add and is the right answer for “I need this write to see the latest state.”
  3. Engine rebase (future): SQLite’s sqlite3rebaser_* can rebase a conflicting changeset onto the already-resolved base and re-apply per change, the finest-grained merge. node:sqlite does not expose it yet (Finding G), so v1 uses (1)+(2); (3) is the enhancement that would let even strict conflicts auto-merge without re-running fn.

Could writes be even more parallel — independent “sections”? The merge is already not the write bottleneck (writes are lock-free; the merge is a small, out-of-band pointer swap). If it ever were, the store can be sharded into independent stores (e.g. one per table-group), each with its own current pointer and reconcile — giving fully independent publish lanes — at the cost of no cross-shard atomic snapshot. This is a real option but unnecessary unless profiling shows merge contention; it is listed as a possible extension, not a v1 need.

Summary of the trade. Versus WAL: this design offers strictly more read concurrency in practice (readers never touch a live file) and far more write concurrency (many lock-free writers and row-level conflicts vs one global writer), and it works on NFS and survives hot copies — in exchange for weaker isolation (row-level merge with write-set conflict detection, plus a reconcile visibility lag) instead of WAL’s serializable single-writer. That trade is the whole point: it buys the requirements (R-FS, R-DURABLE, R-CONCURRENCY) at the cost of serializability, which the target use cases (caches, agent state, content- addressed metadata) do not need.

Key Insights

  1. Stop hunting for a better file lock. The fine-grained part of “locking” is always in-memory and same-host; cross-host it must become a commit. Put the serialization at a millisecond atomic pointer-swap and make it idempotent — then lock perfection is unnecessary.
  2. A changeset is the OCC commit-log in SQLite-native form. It turns “concurrent writers” into “each writer drops a uniquely-named file” and “a merge step folds them,” all plain files, conflict-free even on NFS — exactly the per-session optimism the requirements ask for.
  3. Immutability is what buys hot-copy and crash safety for free. Because published heads never mutate and the pointer swap is atomic, any copy of the directory yields a consistent older state plus an ignorable pending tail. R-DURABLE falls out of the data model, not out of careful locking.
  4. Readers are universal; only writers/merge need the extension. This keeps the tri-language and tool-compatibility goal intact — Python/Rust/sqlite3 read a published snapshot today; the session requirement is confined to the writer/reconcile runtime.
  5. The constraints are in the binding, not SQLite. node:sqlite’s missing table-name in the conflict handler, ABORT-as-false, and missing rebaser are all binding-level; the C API is richer. This shapes v1 (filter-based per-table policy; constrained writable surface so no rebaser is needed) and gives a clean upstream ask.
  6. Two write strategies, one format cleanly serves both “many agents, maybe on NFS” and “one local tool, keep it dead simple” without forking the on-disk format or the reader path.

Comparison Matrix

Concurrency approaches for a single SQLite store with no server. Read vs write concurrency stated separately, with the conflict/lock granularity and the isolation each provides (see Finding K for the precise definitions and caveats):

Approach Read concurrency Write concurrency Conflict / lock granularity Isolation Server-free Any FS (incl. NFS) Hot-copy-safe
Single connection, rollback journal (default) readers exclude the writer 1 writer whole-DB serializable ✗ (locks)
WAL + busy_timeout many, lock-free (snapshot) 1 writer (global) whole-DB write lock writer serializable; readers snapshot ✗ (-shm)
libSQL BEGIN CONCURRENT many many (commit-time check) page-level snapshot + OCC ✗ (-shm)
Wrapper mkdir-lock + rollback (≈ DirectStore) many (immutable snapshots) 1 writer (lease) whole-DB each publish atomic
Elected coordinator (LiteFS) many 1 primary (replicas read-only) engine-level serializable ✗ (FUSE daemon + lease) n/a — own FUSE FS + replication n/a (replicated)
Raft server (rqlite/dqlite) many many engine-level serializable ✗ (cluster) n/a — replicates over the network n/a (replicated)
Snapshot-publish + changeset log (this doc) unlimited, lock-free (snapshot per version) unlimited at capture + one out-of-band merge per-row (per-PK), per-table policy row-level write-set merge + DATA-conflict detection (not serializable) ✓✓

A fairness note on the “Any FS” column: LiteFS and rqlite/dqlite are not “unsafe on NFS” — they don’t use a shared filesystem at all. LiteFS gives each node its own FUSE filesystem and replicates writes from a single elected primary; rqlite/dqlite replicate via Raft over the network. Both require a running process (a daemon/cluster), which is exactly the cost the server-free approaches avoid. So they solve a different problem (multi-node replication) rather than “share one file on whatever disk you happen to be on.”

The headline contrast: WAL gives you the strongest isolation (true serializable) but a single global writer and only on local disk; this design gives you many concurrent writers resolved at row granularity and works on any filesystem / survives hot copies, but with weaker (merge-based) isolation. They sit at opposite ends of the concurrency-vs-isolation trade, and the right pick depends on whether you need serializability or many-writer, any-FS, and hot-copy safety with no server. (DirectStore is the middle: WAL’s single-writer simplicity but any-FS/hot-copy-safe and on any binding.)

Options Considered

Option A — Thin external write-lock wrapper (the §I “Tier-1” wrapper)

A cross-process mkdir/flock lease + BEGIN IMMEDIATE + nolock/rollback-journal + open-fresh-on-NFS around each transaction.

Pros: tiny (~40 lines over an existing lock); correct for modest local/NFS writes; keeps one standard SQLite file. Cons: it is the coarse one-writer-at-a-time model the requirements want to avoid; the lease-break/coherence hazards on NFS are managed, not eliminated; no per-session optimism. Good as a fallback or the “direct mode” floor, not as the primary concurrency story.

Option B — Ephemeral elected coordinator (LiteFS pattern)

First participant wins a lease, opens the DB, serves a socket; others RPC in; lease auto- releases on death.

Pros: real in-memory MVCC, strong immediate consistency, cross-host. Cons: there is a process (transient, but a process); cross-platform socket/lifecycle management is exactly the daemon-seamlessness liability R-NOSERVER warns about; more moving parts. Best reserved as a future optimization for hot, strongly-consistent cross-host work.

Option C — Snapshot-publish and transaction-envelope changesets (recommended)

The Findings-H design: lock-free changeset writers, immutable snapshots, atomic pointer, idempotent reconcile, mkdir lease only around publish; plus the Findings-I direct mode for the simple local case.

Pros: satisfies every requirement (per-session optimism; no server; NFS-safe; hot-copy and crash safe; standard-SQLite readers; cross-platform via node:sqlite); validated by 55 multi-process tests; correctness independent of lock perfection. Cons: writers/reconcile need the session extension (Node node:sqlite / Python APSW / Rust rusqlite), which on Node is still --experimental; reader latency lags the pending tail until reconcile; per-write snapshot-seed cost grows with catalog size (mitigated by schema-only seed / compaction); no auto-rebase merge in v1 (constrained writable surface instead).

Eliminated options

  • Raft servers (rqlite/dqlite), mvSQLite/FoundationDB: violate R-NOSERVER; cluster- scale answers to a local-scale problem.
  • CRDT layer (cr-sqlite): dormant; a merge layer, not a store; over-general for our conflict needs.
  • Pure-KV engines (LMDB/RocksDB/redb) for the interchange artifact: fail R-FORMAT (engine-specific, poor external tooling); fine only as a rebuildable local cache.
  • A custom VFS that intercepts xLock/xShmMap: native C, large build, and the serialization point can live at one app-level chokepoint anyway — transparency buys little here.

Recommendations

Build a small, dependency-free, cross-platform package that implements Option C, with this surface:

const store = new SnapshotStore(dir, {
  schema: SCHEMA_SQL,            // caller DDL applied to snapshot 0; every table NON-NULL PK
  schemaVersion: 1,
  appId: 0x........,             // PRAGMA application_id
  conflictPolicy: {              // per-table merge semantics
    cache_table: 'lww',          //   last-writer-wins (REPLACE on DATA/CONFLICT, OMIT else)
    edges: 'union',              //   set-union (OMIT duplicates)
    '*': 'strict',               //   deny-by-default: a conflict quarantines the tx
  },
}).init()

const txId = store.write(writerId, (db) => { /* INSERT/UPDATE/DELETE */ })  // lock-free envelope
const rows = store.read((db) => db.prepare('…').all())                       // immutable snapshot
const res  = store.reconcile()      // build → quick_check → atomic publish; exactly-once
const lease = store.readLease()     // pin current for a packager; lease.release() after
store.gc({ retain: 3 }); store.validate(); store.repair(); store.seal('out.sqlite')

// Easy-local alternative over the same format (no session extension, any binding):
const v = new DirectStore(dir, opts).write((db) => { /* … */ })

Module/code outline (what a reproducer needs to build):

  • lockwithLock(lockPath, fn(token), {timeoutMs>staleMs, staleMs, pollMs}) + ownerOf(lockPath) + heartbeat: the NFS-safe mkdir lease lock with an owner token, held only around publish; supports revalidate-before-publish (fencing). Invariant timeoutMs > staleMs. ~50 lines.
  • driver — thin node:sqlite wrapper: open/openReadonly, recordChangeset(db, fn) (session over a single tx), vacuumInto, quickCheck, integrityCheck. The only place the session extension is touched; lazy-loaded for write/reconcile.
  • conflictPolicynormalize(policy) and applyWithPolicy(db, changeset, policy): per-table policy via one filtered applyChangeset pass per action (the binding gives the table name only to filter), conflict-type-aware 'replace', and 'strict' detected via the false return of an ABORT pass → roll back the whole tx → quarantine.
  • snapshotStore — the engine: init (snapshot 0 with schema + application_id/ user_version + an applied_tx ledger table), read, write (seed-from-snapshot or seed:'schema-only'; envelope with COMMITTED last; manifest carries schema_version+schema_hash+changeset_sha256), reconcile (build-temp → apply non- ledgered committed tx with the policy → quick_check → atomic publish under the lease, with OCC re-check + exactly-once via applied_tx), gc (retain N + leased), readLease, validate (classify sealed/live/in-flight-capture/corrupt from the structural tells), repair (restore last consistent point), seal (VACUUM INTO finalize).
  • directStore — the easy-local strategy: write(fn) = lock + copy current → mutate → quick_check → atomic publish, with OCC; reuses the same layout/read.
  • CLI — schema-agnostic info/validate/gc/seal/repair (these operate on the snapshot/pointer/envelope layer, so they need no schema or policy — an operator can maintain a store they did not create). validate exits 0/2/3 for sealed-or-live / in-flight / corrupt.
  • Tests — port the spike’s multi-process harness (writer/reconciler/reader/direct- writer/lock-holder child processes): zero-loss concurrency, racing reconcilers exactly- once, readers-never-torn, crash-at-every-publish-step, true SIGKILL, randomized chaos, fencing/lease, GC+lease, portability (magic header + pragmas), repair, direct-mode interop.

Prototype is done. A complete reference implementation of all of the above lives at explorations/spikes/snaplog-standalone-package/ (the finterm repo) with 55 passing tests, a runnable demo.mjs, a worked generic-consumer example, and SOURCES.md recording the exact upstream source lines. It is intentionally domain-agnostic — the prototype’s tests inject a toy schema, and a separate example injects an unrelated one — so it transfers directly to a standalone package.

Recommended sequencing for productionizing:

  1. TypeScript port of the prototype (types; lazy node:sqlite; the same tests).
  2. Cross-platform install/runtime matrix (macOS arm64/x64, Linux glibc/musl x64/arm64, Windows x64) installing the packed tarball, plus a real NFSv4.1/EFS/Filestore gate (the one thing a local spike cannot settle — mkdir/rename atomicity under contention, actimeo/close-to-open, lease-break under partition, fsync durability).
  3. Schema migration across snapshot versions (deterministic DDL step; reconcile quarantines schema-mismatch via schema_hash).
  4. Cheaper seeding for large catalogs (schema-only default for insert-heavy writers, or a shared read-only base and overlay) and snapshot compaction/retention.
  5. Optional later: Python (APSW) / Rust (rusqlite) writer parity against the same envelope format; an ephemeral-coordinator mode for hot strongly-consistent cross-host work; upstream node:sqlite asks (forward the conflict iterator/table name; expose the rebaser).

Open design questions to settle with maintainers: changeset vs patchset (recommend changeset); rebaser-gap handling (recommend constrained writable surface for v1); direct-mode strictly opt-in (never inferred from filesystem detection — the dangerous cases look local); and whether bit-rot repair should replay tx/applied/ onto the prior snapshot rather than fall back to it.

Next Steps

  • Decide whether to extract the prototype into a standalone open-source package and under what name/scope.
  • TypeScript port, plus the cross-platform and real-NFS test matrices.
  • Schema migration, cheaper seeding, and compaction.
  • File the upstream node:sqlite enhancement requests (conflict iterator/table name; rebaser).
  • Evaluate Python/Rust writer parity if multi-language writing (not just reading) is needed.

Methodology

Empirical probes were run against node:sqlite (Node v22.22.2) and cross-checked against upstream source checked out per the repo’s third-party-checkout workflow. Four repos were read at the source level (commits in the prototype’s SOURCES.md): sqlite/sqlite (002d33d) for the session/locking/WAL semantics (ext/session/sqlite3session.{h,c}, src/wal.c, src/os_unix.c); nodejs/node (v22.22.2, SQLite 3.51.2) for the binding (src/node_sqlite.cc, deps/sqlite/sqlite.gyp); WiseLibs/better-sqlite3 (f93f490, v12.10.0, SQLite 3.53.2) for binding/format/session-availability (deps/defines.gypi, lib/); and moxystudio/node-proper-lockfile (9f8c303, v4.1.2) for the mkdir-lease prior art (lib/lockfile.js, README.md). Each technical claim in this doc carries a file:line reference verifiable against those checkouts. A complete prototype with 55 multi-process tests was built and run end-to-end (zero-loss concurrency, racing reconcilers, crash-at-every-publish-step, SIGKILL, chaos, fencing, GC/lease, portability, repair, direct-mode interop) plus performance/seeding/patchset benchmarks. The prior-art landscape and the locking/journaling/filesystem background also draw on the finterm embeddable-databases survey (2026-06-04, §G–§J); point-in-time version/maturity facts should be re-verified before relying on them.

References

SQLite — format, locking, journaling, sessions (primary):

Bindings:

Concurrency / locking mechanisms (background):

Prior art — distributed/concurrent/replicated SQLite:

Commit-log / OCC pattern (the cross-host “lock becomes a commit”):

finterm internal background (motivation and deeper survey):

  • Modern Embeddable / File-Based Databases survey (§G concurrency theorem, §H beads network-drive pitfall, §I write-lock wrapper, §J session/changeset) — the parent survey this generalizes
  • Prototype: explorations/spikes/snaplog-standalone-package/ (55 tests; SOURCES.md records the exact SQLite/node:sqlite source lines) · earlier in-domain proofs explorations/spikes/dataroom-changeset-catalog/ and dataroom-snapshot-catalog/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment