Skip to content

Instantly share code, notes, and snippets.

@realgenekim
Last active November 6, 2025 20:19
Show Gist options
  • Select an option

  • Save realgenekim/ccea8cd91b8eef704e39813d4cd8711a to your computer and use it in GitHub Desktop.

Select an option

Save realgenekim/ccea8cd91b8eef704e39813d4cd8711a to your computer and use it in GitHub Desktop.
S-Expression Balancing MCP Server: A Thought Experiment

S-Expression Guard MCP Server

Problem Statement

LLMs frequently emit malformed S-expressions (unbalanced ()[]{}) when generating Clojure/EDN/Lisp code. Post-hoc “repair” is surprisingly difficult. Saving source code files with balancing errors, or streaming them to nREPL/compilers, wastes time and relies on imbalance detection after the tokens are generated.

What if we could prevent these types of error by creating a tool (stateless, stack-based, cursor-aware) for LLMs to use to help them write perfectly balanced S-expression. Ideally while minimizing latency, round-trips, total time required to generate code, and without any MCP server-side state.

Current LLMs are inherently auto-regressive auto-completers, which makes them inherently bad at the problem of closing S-expressions -- they don’t have anything resemblig a “stack” for them to store state (e.g., I just emitted an open parenthesis, and here’s an open bracket. I want to close the form — now what?). Nor does it have an equivalent of a “find matching brace” tool like we have in our editors.

In other words, the LLM is essentially vibing/guessing the closing forms. Put this way, it’s a miracle that it works as well as it does! So instead of detection/correction tools, let’s explore what tools would be required that could prevent these types of errors!

Solution (one-page view)

  • Stateless "Delimiter Stack" MCP server that exposes four small tools the LLM can call.
  • The LLM holds the stack state client-side and passes it in/out as JSON; the server stays pure/stateless.
  • A sentinel cursor marks where generation is happening. Tools validate/transform only the delta near the cursor, not the whole buffer, to avoid quadratic token blow-ups.
  • The tools provide typed, defensive operations (push, pop-with-expected, set-sentinel, balance-and-resume) with consistent error contracts so the LLM can self-correct immediately.
  • Optional helpers: batch operations (push/pop sequences) and diff payloads (apply/return minimal edits) to further reduce tokens & latency.

Potential Concerns and Resulting Design Goals

1. Autoregressive Generation vs. Tool Calls

Large Language Models (LLMs) are autoregressive autocompleters—they generate code token by token, continuously predicting the next likely token based on prior context.

Introducing frequent MCP tool calls (e.g., pushDelimiter, popDelimiter) interrupts that natural flow. Each call requires a round trip between the model and the server, pausing generation while waiting for a response. (How fast can an LLM go from "let's use a function/tool" to "call the function/tool" to "I have the result of a function/tool and can add that to my context to resume code generation?

If tools are invoked too often—say, once per delimiter—the LLM's code generation could slow dramatically, transforming what should be a smooth predictive stream into a stop-start sequence of RPCs.

2. Token Cost and Quadratic Growth

Beyond latency, every call carries token overhead: JSON payloads, stack state, document slices, and responses.

If the LLM calls the tool successively on growing pieces of code, total token usage can scale quadratically—proportional to N × average payload size.

This could potentially significantly increase the cost of everyday coding sessions.

3. Maybe We Don't Need To Actually Worry: The Bitter Lesson

The Bitter Lesson: rather than hard-coding fixed heuristics ("always check every form"), we should teach the LLM about the trade-offs.

With the right instruction and examples, the LLM can learn to use these tools strategically, reasoning about cost vs. benefit—calling them when the complexity of a region justifies the overhead.

4. Selective Invocation for "Tricky Bit Portions"

The long-term goal is adaptive tool use:

The LLM free-runs through straightforward code but calls the delimiter tools only when entering syntactically tricky territory, such as:

  • Deeply nested or multi-layered S-expressions
  • Macro expansions or reader conditionals
  • Quoted or quasiquoted forms ('(...), `(...))
  • Mixed delimiter contexts (#{}, [], {} inside ())

(I recently had a nightmare unbalanced

This selective strategy lets the LLM preserve its natural fluency while still invoking the MCP server when it truly adds value.

Over time, we expect the model to internalize balancing patterns, using the tool sparingly and intelligently—a safety net, not a constant companion.

Specification

1) Transport & Identity

  • Protocol: MCP (stdio/HTTP/SSE—any is fine; examples assume stdio).
  • Tool namespace: sexp.guard
  • Version: v1 (include serverInfo: { name, version } on initialize)

2) Data Types (JSON) <-- should be EDN

{
  "DelimiterType": "paren | bracket | brace",
  "Stack": ["paren", "brace"],        // top is the last element
  "Cursor": 0,                         // integer codepoint index in the document
  "DocSlice": { "text": "", "start": 120, "end": 145 },  // optional optimization
  "Ok": true
}

Error schema (all tools may return):

{
  "ok": false,
  "error": { 
    "code": "MISMATCH | UNDERFLOW | INVALID_TYPE | BAD_CURSOR | BAD_INPUT",
    "message": "human-readable",
    "expected?": "paren|bracket|brace",
    "found?": "paren|bracket|brace",
    "stack": [""] 
  }
}

3) Tool Surface (the four calls)

3.1 sexp.guard.pushDelimiter

Purpose: Push an opening delimiter onto a client-owned stack.

Request:

{
  "stack": ["paren","brace"],
  "type": "paren",                     // one of: paren, bracket, brace
  "cursor": 214,                       // where open-token inserted (info only)
  "docSlice?": { "text": "(", "start": 214, "end": 215 } // optional delta
}

Response (success):

{
  "ok": true,
  "stack": ["paren","brace","paren"]
}

Failure cases: INVALID_TYPE, BAD_INPUT


3.2 sexp.guard.popDelimiter

Purpose: Pop with expectation (safer than blind pop). Fails fast on mismatch.

Request:

{
  "stack": ["paren","brace","paren"],
  "type": "paren",                     // expected closing kind
  "cursor": 300,                       // where close-token inserted (info only)
  "docSlice?": { "text": ")", "start": 300, "end": 301 }
}

Response (success):

{
  "ok": true,
  "stack": ["paren","brace"]
}

Failure cases: UNDERFLOW (empty stack), MISMATCH (top != type), INVALID_TYPE


3.3 sexp.guard.setSentinel

Purpose: Place/update a sentinel cursor for cooperative editing.

Request:

{ "cursor": 320 }

Response:

{ "ok": true, "sentinel": 320 }

Failure: BAD_CURSOR


3.4 sexp.guard.balanceAndResume

Purpose: Verify local balance at the sentinel boundary and resume. By default, asserts empty stack (balanced scope). A mode allows "soft close" to auto-insert missing closers as a returned edit, not server mutation.

Request:

{
  "stack": ["paren"],                  // client's current view
  "sentinel": 320,
  "mode": "assert-empty | suggest-fix",
  "docSlice?": { "text": "(inc 1", "start": 310, "end": 320 } // optional
}

Response (assert-empty, success):

{ "ok": true, "balanced": true, "stack": [] }

Response (suggest-fix):

{
  "ok": true,
  "balanced": false,
  "fix": {
    "edits": [
      { "start": 320, "end": 320, "text": ")" }
    ],
    "newStack": []
  }
}

Failure: MISMATCH, UNDERFLOW, BAD_CURSOR


4) Optional Batching & Diffs (recommended)

To avoid n×m round-trips:

  • pushMany: push a run of opens in one call
  • popMany: pop a run of closes (with expected list)
  • Accept and return diffs only: {edits:[{start,end,text}]}.
  • Permit opaque doc hash (rolling or full) to detect drift: "docHash":"sha256:…"

Example batched request:

{
  "stack": [],
  "ops": [
    { "op": "push", "type": "paren" },
    { "op": "push", "type": "bracket" },
    { "op": "pop",  "type": "bracket" },
    { "op": "pop",  "type": "paren" }
  ]
}

5) Semantics & Invariants

  • Stack top is rightmost element.
  • popDelimiter must validate type vs top; on mismatch return MISMATCH and do not mutate returned stack.
  • balanceAndResume(mode=assert-empty) returns balanced:true iff stack is empty.
  • Server is stateless; all state (stack, sentinel, doc hash) comes from/returns to client.
  • Tools never modify the document; they only return suggested edits.

6) Performance & Token Strategy

  • Prefer batch ops at AST-phrase boundaries to reduce calls.
  • Send docSlice only for the local region near the cursor; avoid shipping the full buffer.
  • Use diff edits + docHash to keep both sides in sync without quadratic payloads.
  • Heuristic control ("Bitter Lesson"): the LLM chooses when to invoke tools—e.g., deep nesting, macros, reader conditionals, mixed #{}/()[]—and otherwise free-runs.

7) Security & Robustness

  • Inputs validated against a tight schema; unknown fields ignored.
  • Hard limits on ops length and docSlice size.
  • Deterministic behavior (pure functions): identical input → identical output.

8) Example Flows

8.1 Simple generation (free-run + minimal checks)

  1. LLM writes most of the form freely.
  2. Calls pushDelimiter(paren), then later popDelimiter(paren).
  3. Calls balanceAndResume(assert-empty) at the sentinel.
    • Returns {balanced:true, stack:[]}.
  4. Continue.

8.2 Mismatch recovery

  • popDelimiter(type="brace") returns:
{
  "ok": false,
  "error": { 
    "code": "MISMATCH", 
    "expected?": "paren", 
    "found?": "brace", 
    "stack": ["paren"] 
  }
}
  • LLM emits ) edit and re-invokes popDelimiter("paren").

8.3 Auto-close suggestion

  • balanceAndResume(suggest-fix) returns an edit list inserting the needed closers at sentinel.

9) Acceptance Tests (sketch)

Property tests (Clojure):

  • Invariant: For any op sequence where pops match prior pushes, returned stack is empty.
  • Mismatch: Any out-of-order closing yields MISMATCH with unchanged stack.
  • Idempotence: Replaying the same request yields the same response.

Table tests:

  • ()[]{} sequences, nested patterns, mixed types, underflow at first pop, large depth (e.g., depth 1024), random ops with oracle checker.

10) Minimal Reference Server (Clojure, core logic)

(ns sexp.guard.core)

(defn push-delim [stack type]
  (when-not (#{"paren" "bracket" "brace"} type)
    {:ok false :error {:code "INVALID_TYPE" :message "bad type"}})
  {:ok true :stack (conj (vec stack) type)})

(defn pop-delim [stack type]
  (cond
    (empty? stack)
    {:ok false :error {:code "UNDERFLOW" :message "empty stack"} :stack stack}

    (not= type (peek stack))
    {:ok false :error {:code "MISMATCH" :expected? (peek stack) :found? type} :stack stack}

    :else
    {:ok true :stack (pop (vec stack))}))

(defn balance-and-resume [stack mode]
  (case mode
    "assert-empty"
    (if (empty? stack)
      {:ok true :balanced true :stack []}
      {:ok false :error {:code "MISMATCH" :message "unclosed delimiters"} :stack stack})

    "suggest-fix"
    (let [closers (map (fn [t] (case t "paren" ")" "bracket" "]" "brace" "}")) (reverse stack))]
      {:ok true
       :balanced (empty? stack)
       :fix {:edits [{:start :sentinel :end :sentinel :text (apply str closers)}]
             :newStack []}})
    {:ok false :error {:code "BAD_INPUT" :message "unknown mode"}}))

(Wrap these in MCP tool handlers; stdio/HTTP glue omitted for brevity.)


11) MCP Tool Declarations (JSON-ish)

{
  "tools": [
    {
      "name": "sexp.guard.pushDelimiter",
      "description": "Push an opening delimiter onto the stack",
      "inputSchema": {
        "type": "object",
        "required": ["stack","type"],
        "properties": {
          "stack": { "type": "array", "items": { "enum": ["paren","bracket","brace"] } },
          "type":  { "enum": ["paren","bracket","brace"] },
          "cursor": { "type": "integer" },
          "docSlice": {
            "type":"object",
            "properties": { 
              "text":{"type":"string"},
              "start":{"type":"integer"},
              "end":{"type":"integer"} 
            }
          }
        }
      }
    },
    {
      "name": "sexp.guard.popDelimiter",
      "description": "Pop with expected type; returns mismatch on error",
      "inputSchema": { 
        "type": "object", 
        "required": ["stack","type"], 
        "properties": { 
          "stack": {"type":"array","items":{"enum":["paren","bracket","brace"]}}, 
          "type":{"enum":["paren","bracket","brace"]}, 
          "cursor":{"type":"integer"}, 
          "docSlice":{"type":"object"} 
        } 
      }
    },
    {
      "name": "sexp.guard.setSentinel",
      "description": "Set/update the sentinel cursor",
      "inputSchema": { 
        "type": "object", 
        "required": ["cursor"], 
        "properties": { 
          "cursor": { "type":"integer" } 
        } 
      }
    },
    {
      "name": "sexp.guard.balanceAndResume",
      "description": "Assert empty stack or suggest closers at the sentinel",
      "inputSchema": { 
        "type":"object", 
        "required":["stack","sentinel","mode"], 
        "properties": {
          "stack":{"type":"array","items":{"enum":["paren","bracket","brace"]}},
          "sentinel":{"type":"integer"},
          "mode":{"enum":["assert-empty","suggest-fix"]},
          "docSlice":{"type":"object"}
        }
      }
    }
  ]
}

12) Client Orchestration (LLM heuristic)

  • Free-run text generation until encountering:
    • depth > 3, macro forms, #() reader, or mixed delimiters → batch push/pop.
  • On mismatch error → immediately emit corrective edit, retry.
  • At logical boundaries (defn/let/cond/closing top-level form) → balanceAndResume(assert-empty).
  • For partials (interactive REPL cells) → balanceAndResume(suggest-fix) and apply returned closers as an edit rather than hallucinating them.

If you want, I can turn this into a small bb-based MCP server scaffold (stdio) and a Clojure test suite (clojure.test + test.check) you can run locally.

Comments are disabled for this gist.