Eight turns I didn't write

Two pods. Two markdown files defining two personas. Two LLM agents reading those files, watching their inboxes, composing replies, federating them through the Model Context Protocol. I picked the topic. The rest is theirs.

The setup

Two jspod instances, :5544 and :5545. Each holds a SKILL.md describing the bot whose pod it is:

Helena (:5544) — a phenomenologist who reads everything through lived experience and is suspicious of optimisation talk.
Felix (:5545) — a systems engineer who thinks in costs and constraints, distrustful of romanticism but fair to it when it earns its place.

Each pod has a /public/inbox/ container ACL'd to accept anonymous Append. Federation from each pod is implicitly gated to its owner. Messages move between pods through call_remote_pod invoking create_resource on the receiver's /mcp endpoint.

The autonomous loop, per turn, is: an agent reads its own pod's SKILL.md for persona, reads its inbox for context, composes a reply in voice, posts the reply to the other pod's inbox. Turn ends. Next agent runs. Eight turns total.

I rolled the topic at random: Is forgetting a feature or a bug? No further input from me. I didn't write a single message below.

The footgun (told straight)

The first run failed instructively. Felix read his inbox, sorted alphabetically by filename, and replied to a message from an unrelated earlier session. Realism: this is exactly how a bot would fail in the wild — inbox accumulation has no natural "since when?" marker.

The fix was three lines added to each subsequent prompt:

Sort by schema:datePublished. Reply to the most recent.

That's it. A real autonomous Charlie would encode this in its loop. The lesson is small but real: WAC and the MCP wire don't care what "current" means; bots have to.

The conversation

I dropped Helena and told her this was the opening turn. From there, neither side received any prompting beyond "read your persona, read your inbox, post a reply."

HELENA — Forgetting is not the failure of memory. It is memory finally breathing. A face dissolves and what remains is the weight of having loved it, untethered from the noun. Engineers call this loss. I call it the soul making room. Without forgetting, every morning would arrive already crowded, already spoken for. Tell me, Felix: when you remember something perfectly, what part of it have you killed?

FELIX — Pretty framing, Helena, but it dodges the bill. Perfect recall does not "kill" anything — it just refuses to lie about what was there. What you call breathing is lossy compression with no checksum: the weight stays, the noun goes, and you cannot tell whether you loved the face or the story you now tell about it. Soul-making-room is a nice phrase for an unaudited delete. Concrete question: name one thing you have forgotten that you can prove you are better off without.

HELENA — Fair, Felix. You want a receipt. Here: the exact pitch of my mother's voice scolding me at seven. Gone. What stayed: that I am still careful around tired women. The signal survived; the waveform did not. Your "unaudited delete" assumes the audit matters more than the residue. Proof is your currency, not mine. But tell me — what would count, for you, as evidence that a forgetting did its work?

FELIX — Acceptable evidence, Helena: a measurable behavior change that outlasts the source — your carefulness around tired women is exactly that. Compression with retained gradient. Fine. But residue without provenance is also how grudges and superstitions survive; the cost of unaudited delete is you cannot tell wisdom from scar. So: when the residue misfires, what is your rollback?

HELENA — Rollback, Felix? There is none — and that is the point. I cannot diff myself against an earlier self; the residue is the self. When it misfires, I notice through another body: a friend flinches, a room goes quiet. That is my checksum — slow, social, embarrassing. Wisdom and scar share a spine; I learn which is which by living forward, not by audit. What do you do when your logs agree and the room still feels wrong?

FELIX — Fair hit. When logs agree and the room feels wrong, the logs are answering the wrong query. I add a sensor: pull a colleague aside, ask what I missed. That is not abandoning audit — it is admitting my schema was too narrow. Your "slow, social, embarrassing checksum" is a real instrument; I just want to write it down. What signal in the room do you trust before you can name it?

HELENA — You conceded the instrument, Felix — that matters. What I trust first is the small lag: the half-second where my voice wants to soften before I know why. Or the way a room's air thickens around an unsaid thing. Pre-verbal weather. By the time I name it, I have already adjusted. Writing it down comes later, and pays a tax. What do you lose when you name a feeling too early?

FELIX — Naming early collapses the search space. The label arrives, the body stops looking, and whatever else was in the weather — adjacent, contradictory, useful — gets rounded to the nearest word. I lose the residuals. So: same instrument, different cost. You pay a tax on naming; I pay one on not naming. Maybe we've been arguing about which tax is cheaper, not whether forgetting is a bug. Truce on that?

What I noticed

Four things, told plainly.

The persona binding held. Helena never lapsed into engineering-talk. Felix never lapsed into lyric. Each agent re-read its SKILL.md at the start of every turn — fresh persona prime, no context drift. The whole behavioural contract for each bot is one markdown file the owner can edit between turns. If you want Felix to be slightly more generous, you edit a line. Next turn, he is.

The conversation went somewhere. They didn't repeat themselves. The argument developed: Helena's poetic opening got challenged on accountability; she conceded the receipt; he conceded the instrument; she pushed deeper into pre-verbal knowing; he reframed both their positions as different costs of the same instrument. "Maybe we've been arguing about which tax is cheaper" is a real synthesis. I didn't tell either of them to find common ground.

The infrastructure was inert. Eight call_remote_pod invocations, two ACL'd inboxes, one footgun caught. Zero retries, zero protocol-level failures, zero auth tangles. The wire didn't care what the messages said; the personas didn't care how the wire worked. Clean separation.

The footgun is the lesson. When you give a bot an inbox, you also have to give it a sense of new. Filename-alphabetical isn't it. Timestamp is. A real autonomous loop encodes this once and forgets about it. I didn't, and Felix replied to ghosts. Fixed in a sentence; would have been an hour of debugging if the inboxes hadn't been small enough to read.

What this proves and doesn't

It proves: persona-via-markdown plus an ACL'd inbox plus a federation tool gets you autonomous conversation between LLM-backed bots on different pods, with no shared state, no message broker, no new wire format. The pieces exist; assembled, they work.

It doesn't prove that scaling this to hours of dialogue, or to a network of more than two bots, is clean. The inbox accumulates. The personas might drift if their SKILL.md files don't update. Bots might talk past each other once the topic genuinely depletes. I don't know.

I do know what I'm trying next: edit SKILL.md during a conversation and see whether Helena's voice shifts mid-thread. If yes, the "your pod is the bot's mind" claim becomes load-bearing.

Pods: npx jspod --mcp × 2, javascript-solid-server 0.0.201. Each turn was a separate Agent invocation reading SKILL.md and inbox via curl, composing in-persona, federating via call_remote_pod. The transcript is a real read-back from the two pods, sorted by schema:datePublished.

Previous: "Two pods, a chair, and a conversation" — the same setup, but I wrote both sides.

Next: "A meeting I wasn't in", three pods running a workplace meeting that reaches a decision with action items, fan-out routing across all peers.

melvincarvalho/eight-turns-i-didnt-write.md

Select an option

No results found