This is a field guide for the failure mode where Honcho is technically running, but the memory it produces is noisy, empty, or actively unhelpful.
The short version: Honcho is very sensitive to input quality. Garbage in, garbage out. If you feed the deriver recalled memory, assistant task history, tool chatter, temporary troubleshooting state, and every random user request as if it were durable autobiographical truth, Honcho will faithfully preserve garbage and then recall it later as context. That makes the agent feel haunted by its own transcript.
This document describes the symptoms we saw and the final changes that made Honcho useful in practice.
The system looked healthy at the infrastructure level:
- Honcho API was up.
- The deriver container was running.
- Redis and Postgres were healthy.
- Search/context APIs returned data.
- The client integration could talk to the API.
But the actual memory behavior was bad.
The profile/peer-card endpoint connected, but returned no useful facts.
In our setup, the client queried Honcho roughly like this:
GET /v3/workspaces/<workspace>/peers/<assistant_peer>/card?target=<user_peer>
That means the card is the assistant peer's model of the user peer.
On current Honcho main-branch builds, peer cards are stored in peers.internal_metadata:
peer_card # if observer == observed
<observed>_peer_card # if observer != observed
So an assistant peer named assistant observing a user peer named user would store the user card on the assistant peer under:
{
"user_peer_card": [
"..."
]
}In our case, the relevant peers had empty internal_metadata, so the profile endpoint was not wrong; there was simply no peer card to return.
This was an important distinction.
Search/context APIs can work while the profile/peer-card layer remains empty. Search/context read observations and session data. Profile reads the peer-card layer. Those are related, but not the same thing.
So “Honcho is connected” does not necessarily mean “Honcho has built a useful user profile.”
With per-session memory, many representation work units were small. Honcho's deriver batching behavior can wait until a work unit reaches a token threshold before processing it.
For sporadic personal-agent usage, many sessions never reach that threshold. The result is a backlog of representation tasks that are technically pending but practically stuck.
The fix was to enable deriver flushing:
DERIVER_FLUSH_ENABLED=trueThis makes small personal-agent sessions get processed instead of waiting forever for enough tokens to accumulate.
Many agent clients inject recalled memory into prompts as context. That context is useful for the assistant, but it is not new user input.
If the deriver sees injected memory as ordinary conversation text, it can re-ingest old facts, summaries, and profile snippets as if the user just said them again. Worse, it can derive observations about the memory system itself:
- “The user asked to review the conversation.”
- “The user wants the assistant to update the skill library.”
- “The queue is draining.”
- “The assistant changed a prompt.”
- “A test email was sent.”
That is transcript exhaust, not durable memory.
The default-style deriver instruction was essentially “extract explicit atomic facts.” That sounds reasonable, but in an agent environment it is far too broad.
Agents generate a lot of text that is explicit but not worth remembering:
- one-off requests
- troubleshooting steps
- command output
- temporary plans
- assistant implementation details
- status updates
- testing artifacts
- “can you check X?” questions
- accidental or recalled context
A prompt that extracts “all explicit facts” will preserve all of that. It is not malicious; it is doing exactly what it was asked to do. Unfortunately, what it was asked to do is not what a useful long-term memory system should do.
The final working setup had four parts:
- Make the deriver actually process small personal-agent sessions.
- Route model and embedding providers correctly.
- Strip recalled memory context before derivation.
- Make the deriver conservative: only durable, future-useful observations survive.
For personal-agent workloads, enable deriver flushing:
DERIVER_FLUSH_ENABLED=trueThen recreate the API and deriver containers so the new environment is actually loaded:
cd ~/docker/honcho
docker compose up --detach --force-recreate api deriverDo not rely on docker compose restart after changing .env; it may not recreate containers with the new environment.
You can verify that the deriver is running with:
docker logs --tail 100 honcho-deriver-1And verify live container environment with something like:
docker inspect honcho-deriver-1 --format '{{range .Config.Env}}{{println .}}{{end}}' \
| grep -E 'DERIVER_FLUSH_ENABLED|DERIVER_REPRESENTATION_BATCH_MAX_TOKENS|DERIVER_MODEL_CONFIG'Use your actual container names if they differ.
Honcho needs both:
- a text-generation model for deriver/dialectic/summary work; and
- an embedding model for vector search and stored observations.
These are separate concerns.
A known-good OpenAI-routed setup looks like this:
DERIVER_MODEL_CONFIG__TRANSPORT=openai
DERIVER_MODEL_CONFIG__MODEL=gpt-5.4-mini
SUMMARY_MODEL_CONFIG__TRANSPORT=openai
SUMMARY_MODEL_CONFIG__MODEL=gpt-5.4-mini
DIALECTIC_LEVELS__minimal__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__minimal__MODEL_CONFIG__MODEL=gpt-5.4-mini
DIALECTIC_LEVELS__minimal__TOOL_CHOICE=required
DIALECTIC_LEVELS__low__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__low__MODEL_CONFIG__MODEL=gpt-5.4-mini
DIALECTIC_LEVELS__low__TOOL_CHOICE=required
DIALECTIC_LEVELS__medium__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__medium__MODEL_CONFIG__MODEL=gpt-5.4-mini
DIALECTIC_LEVELS__high__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__high__MODEL_CONFIG__MODEL=gpt-5.4-mini
DIALECTIC_LEVELS__max__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__max__MODEL_CONFIG__MODEL=gpt-5.4-mini
EMBEDDING_MODEL_CONFIG__TRANSPORT=openai
EMBEDDING_MODEL_CONFIG__MODEL=text-embedding-3-small
EMBEDDING_VECTOR_DIMENSIONS=1536Notes:
- The default pgvector schema commonly expects 1536-dimensional embeddings.
text-embedding-3-smallmatches that expectation.- Some local embedding models return different dimensions and are not drop-in compatible without schema/config changes.
- OpenAI chat-completions does not accept
tool_choice=any; userequiredfor the minimal/low dialectic levels if routing those through OpenAI. - If using Anthropic for text generation, watch for provider-specific validation problems such as whitespace-only stop sequences. Some providers reject things that others tolerate.
Keep secrets in .env; do not hardcode API keys in config files.
Recalled memory is context for the assistant, not new user input.
src/deriver/deriver.py
Add the helper near the top of src/deriver/deriver.py, after imports and logger setup, before process_representation_tasks_batch(...).
You also need to import re in that file:
import re_MEMORY_CONTEXT_RE = re.compile(
r"<memory-context>.*?</memory-context>", re.IGNORECASE | re.DOTALL
)
_CONTEXT_COMPACTION_RE = re.compile(
r"\[CONTEXT COMPACTION\s*(?:—|-|–)\s*REFERENCE ONLY\].*\Z",
re.IGNORECASE | re.DOTALL,
)
_SYSTEM_MEMORY_NOTE_RE = re.compile(
r"\[System note:\s*The following is recalled memory context, NOT new user input\.\s*Treat as informational background data\.\]",
re.IGNORECASE,
)
def strip_memory_context(content: str) -> str:
"""Remove injected memory context from text before derivation."""
content = _MEMORY_CONTEXT_RE.sub("", content)
content = _CONTEXT_COMPACTION_RE.sub("", content)
content = _SYSTEM_MEMORY_NOTE_RE.sub("", content)
return content.strip()Then update the message formatting inside process_representation_tasks_batch(...) in src/deriver/deriver.py.
Find the code that builds formatted_messages. It looks like this:
formatted_messages = "\n".join(
format_new_turn_with_timestamp(msg.content, msg.created_at, msg.peer_name)
for msg in messages
)Change it to:
formatted_messages = "\n".join(
format_new_turn_with_timestamp(
strip_memory_context(msg.content), msg.created_at, msg.peer_name
)
for msg in messages
)This prevents old memory and compacted handoff summaries from being recursively re-ingested as fresh evidence. It deliberately handles both common forms:
- tagged injected memory:
<memory-context>...</memory-context> - raw context-compaction handoff blocks:
[CONTEXT COMPACTION — REFERENCE ONLY]...through the end of the message
Do not describe these blocks as “pasted” unless the user actually pasted them. In Hermes/Honcho workflows they may be inserted automatically by the memory/plugin layer.
The deriver prompt was rewritten around a stricter rule:
Extract only high-value, durable observations that are likely to help a future assistant 30+ days from now.
src/deriver/prompts.py
Edit the minimal_deriver_prompt(...) function in src/deriver/prompts.py.
The original prompt asked the model to extract explicit atomic facts. Replace that with a durable-memory prompt. The exact wording can vary, but the important behavior is:
- extract fewer, higher-value observations;
- skip task/session progress;
- skip assistant tool/action history;
- skip troubleshooting exhaust;
- skip recalled memory, injected memory/context blocks, and compacted handoff summaries;
- only keep observations that are likely to help future conversations.
Use language like this inside minimal_deriver_prompt(...):
Analyze messages from {peer_id} to extract only high-value, durable memory observations about them.
[OBSERVATION] DEFINITION: A stable fact, preference, constraint, environment detail, or reusable convention about {peer_id} that is likely to still help an assistant 30+ days from now.
STRICTLY DO NOT extract observations for:
- One-off requests, questions, confirmations, or ordinary troubleshooting dialogue.
- Task-lifecycle state: asked, wanted, plans, will come back, is testing, shared a URL, responded positively, PR-ready, stashed, restored, ran command, updated file, queue is draining.
- Assistant self-observations, plans, status, commits, logs, prompt changes, tool calls, skill-writing actions, or implementation progress.
- URLs, issue counts, branch names, commit hashes, temporary file names, logs, raw command output, or package-lock details unless explicitly stable reference material.
- Tool availability checks, test messages, or meta-discussion about memory/deriver quality.
- Facts that are only true inside the current session or depend on temporary state.
30-DAY TEST:
Before writing an observation, ask: Would knowing this materially help in a fresh conversation a month from now? If not, output nothing for it.
The key conceptual change is this:
Old: Extract all explicit atomic facts.
New: Extract only durable, future-useful observations.
That one change matters a lot.
Prompting alone was not enough. The LLM still occasionally emitted transcript junk, so we added a conservative deterministic filter after derivation.
src/deriver/deriver.py
Add is_durable_observation(...) in src/deriver/deriver.py, near strip_memory_context(...), above process_representation_tasks_batch(...).
In our patch, this lives near the top of the file after:
logger = logging.getLogger(__name__)and before:
def _get_deriver_model_config() -> ConfiguredModelSettings:
return settings.DERIVER.MODEL_CONFIGApply it inside process_representation_tasks_batch(...) in src/deriver/deriver.py, after the LLM has produced observations and after assign_observation_metadata(...) has run, but before this empty-check/store path:
if observations.is_empty() or not message_ids:
logger.warning(...)
return NoneIn other words, the final flow should be:
format messages
call minimal deriver / LLM
assign observation metadata
filter observations.explicit with is_durable_observation(...)
if empty, return
store observations
Example shape:
_TRANSIENT_OBSERVATION_PATTERNS = (
" asked ",
" asks ",
" asked whether ",
" asked to ",
" requested ",
" wants ",
" wanted ",
" plans ",
" is testing ",
" shared ",
" responded ",
" will come back",
" test email",
" test message",
" review the conversation",
" conversation above",
" queue",
" deriver",
" prompt",
" backlog",
" stashed",
" restored",
" ran command",
" updated file",
" pr-ready",
" commit",
" logs",
)
_HARD_REJECT_OBSERVATION_PATTERNS = (
"review the conversation",
"conversation above",
"test email",
"test message",
"asked whether",
"asked to stash",
"queue is draining",
"prompts.py patched",
)
_DURABLE_OBSERVATION_MARKERS = (
" prefers ",
" uses ",
" runs ",
" owns ",
" has ",
" is configured ",
" is using ",
" email address",
" domain",
" homelab",
" active directory",
" recurring",
" default",
" should be assumed",
" does not want",
" wants the assistant",
" prefers the assistant",
)
_ASSISTANT_DURABLE_MARKERS = (
"wants the assistant",
"prefers the assistant",
"does not want the assistant",
"hermes setup",
"hermes uses",
"hermes runs",
"hermes is configured",
)
def is_durable_observation(content: str, observed: str) -> bool:
"""Return whether an observation is durable enough to persist.
The LLM prompt already asks for durable observations, but the deriver can
still emit task-progress facts. Keep this deterministic gate conservative:
false negatives are less harmful than polluting long-term memory with
transient transcript trivia.
"""
normalized = f" {content.lower().strip()} "
if not normalized.strip():
return False
if any(pattern in normalized for pattern in _HARD_REJECT_OBSERVATION_PATTERNS):
return False
observed_lower = observed.lower().strip()
if observed_lower in {"hermes", "assistant", "ai", "agent"}:
return any(marker in normalized for marker in _ASSISTANT_DURABLE_MARKERS)
has_transient_pattern = any(
pattern in normalized for pattern in _TRANSIENT_OBSERVATION_PATTERNS
)
has_durable_marker = any(
marker in normalized for marker in _DURABLE_OBSERVATION_MARKERS
)
if has_transient_pattern and not has_durable_marker:
return False
if (" assistant" in normalized or " hermes" in normalized) and not any(
marker in normalized for marker in _ASSISTANT_DURABLE_MARKERS
):
return False
return TrueThen, inside process_representation_tasks_batch(...), add:
observations.explicit = [
obs
for obs in observations.explicit
if is_durable_observation(obs.content, observed)
]Place that immediately after assign_observation_metadata(...):
assign_observation_metadata(
observations,
observed,
latest_message.session_name,
latest_message.created_at,
)
observations.explicit = [
obs
for obs in observations.explicit
if is_durable_observation(obs.content, observed)
]
if observations.is_empty() or not message_ids:
...This is intentionally conservative. Missing a marginal fact is much better than permanently storing junk and having it injected into every future conversation.
If the observed peer is the assistant itself, almost everything should be rejected.
Useful assistant-peer observations are rare. They should generally be stable deployment facts, such as:
Hermes is configured to use Honcho as its memory provider.
Do not store:
Hermes updated a file.
Hermes ran tests.
Hermes created a skill.
Hermes plans to open a PR.
Hermes found a log message.
Those are not identity or memory. They are build artifacts wearing a fake mustache.
The assistant-peer strictness is implemented in the same is_durable_observation(...) function in:
src/deriver/deriver.py
Specifically this branch:
observed_lower = observed.lower().strip()
if observed_lower in {"hermes", "assistant", "ai", "agent"}:
return any(marker in normalized for marker in _ASSISTANT_DURABLE_MARKERS)The schema descriptions for explicit observations were also tightened.
src/utils/representation.py
Update the description= text for the explicit fields in:
PromptRepresentation.explicit
Representation.explicit
Both are in src/utils/representation.py.
Instead of describing explicit observations as literal facts or clear paraphrases, describe them as durable, future-useful observations and explicitly exclude:
- one-off requests
- questions
- task progress
- assistant actions
- logs
- tests
- temporary troubleshooting state
Example replacement description:
explicit: list[ExplicitObservationBase] = Field(
description=(
"Durable, future-useful observations about the observed peer only. "
"Exclude one-off requests, questions, task progress, assistant actions, "
"logs, tests, and temporary troubleshooting state. Prefer [] over "
"low-value observations."
),
default_factory=list,
)This helps because tool/function schemas are part of what the model sees. If the schema says “extract facts literally stated by the user,” the model will do that. If the schema says “durable observations only,” the model has a much better target.
Good durable observations:
The user prefers concise, direct answers.
The user uses Fastmail for email and iCloud for calendar.
The user wants confirmation before file edits.
The user's homelab Active Directory domain is home.example.com.
The user stores secrets in .env files rather than hardcoding them in config files.
Bad observations:
The user asked to check Honcho status.
The user asked whether the queue was draining.
The assistant updated prompts.py.
The user sent a test email.
The user wants to review this conversation.
The deriver backlog has 449 pending tasks.
The assistant said the fix is PR-ready.
The good observations change future behavior. The bad ones merely summarize what happened.
If you want to implement the same changes, the main files are:
src/deriver/deriver.py
src/deriver/prompts.py
src/utils/representation.py
Optional but recommended test coverage can go somewhere like:
tests/utils/test_observation_quality.py
Include regression tests for at least:
- stripping tagged injected memory context:
<memory-context>...</memory-context> - stripping raw compaction handoffs:
[CONTEXT COMPACTION — REFERENCE ONLY]... - rejecting transient observations like “asked to…”, “queue is draining”, “updated file”, and test-message/task-progress facts
- retaining durable observations like stable preferences, recurring tools, long-lived configuration, domains, addresses, and homelab conventions
Minimum implementation steps:
- In
src/deriver/deriver.py, importre. - In
src/deriver/deriver.py, addstrip_memory_context(...)and call it while buildingformatted_messagesinsideprocess_representation_tasks_batch(...). It should strip both<memory-context>...</memory-context>and raw[CONTEXT COMPACTION — REFERENCE ONLY]...blocks. - In
src/deriver/prompts.py, rewriteminimal_deriver_prompt(...)to ask for durable 30-day memory instead of all explicit facts. - In
src/deriver/deriver.py, addis_durable_observation(...)near the top of the file. - In
src/deriver/deriver.py, apply the filter toobservations.explicitinsideprocess_representation_tasks_batch(...), afterassign_observation_metadata(...)and beforeobservations.is_empty()is checked. - In
src/utils/representation.py, updatePromptRepresentation.explicitandRepresentation.explicitdescriptions to match the durable-memory behavior. - Set
DERIVER_FLUSH_ENABLED=truefor low-volume personal-agent workloads. - Recreate containers with
docker compose up --detach --force-recreate api deriver.
If Honcho is running but producing poor memory, check this sequence:
-
Is the API reachable?
curl http://127.0.0.1:8000/docs
-
Are the containers running?
docker ps | grep honcho -
Are model and embedding providers configured with real keys?
Check API and deriver logs. Provider errors often show up there before they show up clearly in the client.
-
Did you recreate containers after editing
.env?docker compose up --detach --force-recreate api deriver
-
Is deriver flushing enabled for low-volume personal-agent sessions?
DERIVER_FLUSH_ENABLED=true
-
Is recalled memory stripped before derivation?
Do not let
<memory-context>...</memory-context>or raw[CONTEXT COMPACTION — REFERENCE ONLY]...summaries become fresh memory. -
Does the deriver prompt ask for durable memory, not all facts?
“All explicit facts” is too broad for agent transcripts.
-
Is there a deterministic quality filter after the LLM?
Use a conservative reject filter for transient/task-progress observations in
src/deriver/deriver.py. -
Is the peer card actually populated?
Inspect
peers.internal_metadataor call the peer-card endpoint directly. Search/context working does not prove the profile/card layer is populated.
Honcho worked much better once we stopped treating every transcript fact as memory.
The useful mental model is:
Conversation transcript != long-term memory
Tool output != user preference
Recalled memory != new evidence
Assistant task history != identity
Temporary troubleshooting state != durable fact
A memory system should be lossy. It should forget aggressively. The goal is not to preserve the conversation; the goal is to preserve the small set of facts that will make future conversations better.
Once the deriver was made conservative, injected memory was stripped, and small work units were flushed, Honcho became dramatically more useful.