Getting Useful Results from Self-Hosted Honcho

This is a field guide for the failure mode where Honcho is technically running, but the memory it produces is noisy, empty, or actively unhelpful.

The short version: Honcho is very sensitive to input quality. Garbage in, garbage out. If you feed the deriver recalled memory, assistant task history, tool chatter, temporary troubleshooting state, and every random user request as if it were durable autobiographical truth, Honcho will faithfully preserve garbage and then recall it later as context. That makes the agent feel haunted by its own transcript.

This document describes the symptoms we saw and the final changes that made Honcho useful in practice.

Symptoms

The system looked healthy at the infrastructure level:

Honcho API was up.
The deriver container was running.
Redis and Postgres were healthy.
Search/context APIs returned data.
The client integration could talk to the API.

But the actual memory behavior was bad.

1. Profile / peer-card stayed empty

The profile/peer-card endpoint connected, but returned no useful facts.

In our setup, the client queried Honcho roughly like this:

GET /v3/workspaces/<workspace>/peers/<assistant_peer>/card?target=<user_peer>

That means the card is the assistant peer's model of the user peer.

On current Honcho main-branch builds, peer cards are stored in peers.internal_metadata:

peer_card              # if observer == observed
<observed>_peer_card   # if observer != observed

So an assistant peer named assistant observing a user peer named user would store the user card on the assistant peer under:

{
  "user_peer_card": [
    "..."
  ]
}

In our case, the relevant peers had empty internal_metadata, so the profile endpoint was not wrong; there was simply no peer card to return.

2. Search/context worked, but profile did not

This was an important distinction.

Search/context APIs can work while the profile/peer-card layer remains empty. Search/context read observations and session data. Profile reads the peer-card layer. Those are related, but not the same thing.

So “Honcho is connected” does not necessarily mean “Honcho has built a useful user profile.”

3. The deriver queue was not draining reliably

With per-session memory, many representation work units were small. Honcho's deriver batching behavior can wait until a work unit reaches a token threshold before processing it.

For sporadic personal-agent usage, many sessions never reach that threshold. The result is a backlog of representation tasks that are technically pending but practically stuck.

The fix was to enable deriver flushing:

DERIVER_FLUSH_ENABLED=true

This makes small personal-agent sessions get processed instead of waiting forever for enough tokens to accumulate.

4. Recalled memory was being re-derived as new memory

Many agent clients inject recalled memory into prompts as context. That context is useful for the assistant, but it is not new user input.

If the deriver sees injected memory as ordinary conversation text, it can re-ingest old facts, summaries, and profile snippets as if the user just said them again. Worse, it can derive observations about the memory system itself:

“The user asked to review the conversation.”
“The user wants the assistant to update the skill library.”
“The queue is draining.”
“The assistant changed a prompt.”
“A test email was sent.”

That is transcript exhaust, not durable memory.

5. The deriver prompt was too permissive

The default-style deriver instruction was essentially “extract explicit atomic facts.” That sounds reasonable, but in an agent environment it is far too broad.

Agents generate a lot of text that is explicit but not worth remembering:

one-off requests
troubleshooting steps
command output
temporary plans
assistant implementation details
status updates
testing artifacts
“can you check X?” questions
accidental or recalled context

A prompt that extracts “all explicit facts” will preserve all of that. It is not malicious; it is doing exactly what it was asked to do. Unfortunately, what it was asked to do is not what a useful long-term memory system should do.

The Working Shape

The final working setup had four parts:

Make the deriver actually process small personal-agent sessions.
Route model and embedding providers correctly.
Strip recalled memory context before derivation.
Make the deriver conservative: only durable, future-useful observations survive.

1. Enable Deriver Flushing

For personal-agent workloads, enable deriver flushing:

DERIVER_FLUSH_ENABLED=true

Then recreate the API and deriver containers so the new environment is actually loaded:

cd ~/docker/honcho
docker compose up --detach --force-recreate api deriver

Do not rely on docker compose restart after changing .env; it may not recreate containers with the new environment.

You can verify that the deriver is running with:

docker logs --tail 100 honcho-deriver-1

And verify live container environment with something like:

docker inspect honcho-deriver-1 --format '{{range .Config.Env}}{{println .}}{{end}}' \
  | grep -E 'DERIVER_FLUSH_ENABLED|DERIVER_REPRESENTATION_BATCH_MAX_TOKENS|DERIVER_MODEL_CONFIG'

Use your actual container names if they differ.

2. Route Models and Embeddings Correctly

Honcho needs both:

a text-generation model for deriver/dialectic/summary work; and
an embedding model for vector search and stored observations.

These are separate concerns.

A known-good OpenAI-routed setup looks like this:

DERIVER_MODEL_CONFIG__TRANSPORT=openai
DERIVER_MODEL_CONFIG__MODEL=gpt-5.4-mini

SUMMARY_MODEL_CONFIG__TRANSPORT=openai
SUMMARY_MODEL_CONFIG__MODEL=gpt-5.4-mini

DIALECTIC_LEVELS__minimal__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__minimal__MODEL_CONFIG__MODEL=gpt-5.4-mini
DIALECTIC_LEVELS__minimal__TOOL_CHOICE=required

DIALECTIC_LEVELS__low__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__low__MODEL_CONFIG__MODEL=gpt-5.4-mini
DIALECTIC_LEVELS__low__TOOL_CHOICE=required

DIALECTIC_LEVELS__medium__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__medium__MODEL_CONFIG__MODEL=gpt-5.4-mini

DIALECTIC_LEVELS__high__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__high__MODEL_CONFIG__MODEL=gpt-5.4-mini

DIALECTIC_LEVELS__max__MODEL_CONFIG__TRANSPORT=openai
DIALECTIC_LEVELS__max__MODEL_CONFIG__MODEL=gpt-5.4-mini

EMBEDDING_MODEL_CONFIG__TRANSPORT=openai
EMBEDDING_MODEL_CONFIG__MODEL=text-embedding-3-small
EMBEDDING_VECTOR_DIMENSIONS=1536

Notes:

The default pgvector schema commonly expects 1536-dimensional embeddings.
text-embedding-3-small matches that expectation.
Some local embedding models return different dimensions and are not drop-in compatible without schema/config changes.
OpenAI chat-completions does not accept tool_choice=any; use required for the minimal/low dialectic levels if routing those through OpenAI.
If using Anthropic for text generation, watch for provider-specific validation problems such as whitespace-only stop sequences. Some providers reject things that others tolerate.

Keep secrets in .env; do not hardcode API keys in config files.

3. Strip Injected Memory Before Derivation

Recalled memory is context for the assistant, not new user input.

File changed

src/deriver/deriver.py

Where to add it

Add the helper near the top of src/deriver/deriver.py, after imports and logger setup, before process_representation_tasks_batch(...).

You also need to import re in that file:

import re

Code

_MEMORY_CONTEXT_RE = re.compile(
    r"<memory-context>.*?</memory-context>", re.IGNORECASE | re.DOTALL
)
_CONTEXT_COMPACTION_RE = re.compile(
    r"\[CONTEXT COMPACTION\s*(?:—|-|–)\s*REFERENCE ONLY\].*\Z",
    re.IGNORECASE | re.DOTALL,
)
_SYSTEM_MEMORY_NOTE_RE = re.compile(
    r"\[System note:\s*The following is recalled memory context, NOT new user input\.\s*Treat as informational background data\.\]",
    re.IGNORECASE,
)


def strip_memory_context(content: str) -> str:
    """Remove injected memory context from text before derivation."""
    content = _MEMORY_CONTEXT_RE.sub("", content)
    content = _CONTEXT_COMPACTION_RE.sub("", content)
    content = _SYSTEM_MEMORY_NOTE_RE.sub("", content)
    return content.strip()

Then update the message formatting inside process_representation_tasks_batch(...) in src/deriver/deriver.py.

Find the code that builds formatted_messages. It looks like this:

formatted_messages = "\n".join(
    format_new_turn_with_timestamp(msg.content, msg.created_at, msg.peer_name)
    for msg in messages
)

Change it to:

formatted_messages = "\n".join(
    format_new_turn_with_timestamp(
        strip_memory_context(msg.content), msg.created_at, msg.peer_name
    )
    for msg in messages
)

This prevents old memory and compacted handoff summaries from being recursively re-ingested as fresh evidence. It deliberately handles both common forms:

tagged injected memory: <memory-context>...</memory-context>
raw context-compaction handoff blocks: [CONTEXT COMPACTION — REFERENCE ONLY]... through the end of the message

Do not describe these blocks as “pasted” unless the user actually pasted them. In Hermes/Honcho workflows they may be inserted automatically by the memory/plugin layer.

4. Replace “Extract All Facts” with “Extract Durable Memory”

The deriver prompt was rewritten around a stricter rule:

Extract only high-value, durable observations that are likely to help a future assistant 30+ days from now.

File changed

src/deriver/prompts.py

Where to change it

Edit the minimal_deriver_prompt(...) function in src/deriver/prompts.py.

The original prompt asked the model to extract explicit atomic facts. Replace that with a durable-memory prompt. The exact wording can vary, but the important behavior is:

extract fewer, higher-value observations;
skip task/session progress;
skip assistant tool/action history;
skip troubleshooting exhaust;
skip recalled memory, injected memory/context blocks, and compacted handoff summaries;
only keep observations that are likely to help future conversations.

Prompt shape

Use language like this inside minimal_deriver_prompt(...):

Analyze messages from {peer_id} to extract only high-value, durable memory observations about them.

[OBSERVATION] DEFINITION: A stable fact, preference, constraint, environment detail, or reusable convention about {peer_id} that is likely to still help an assistant 30+ days from now.

STRICTLY DO NOT extract observations for:
- One-off requests, questions, confirmations, or ordinary troubleshooting dialogue.
- Task-lifecycle state: asked, wanted, plans, will come back, is testing, shared a URL, responded positively, PR-ready, stashed, restored, ran command, updated file, queue is draining.
- Assistant self-observations, plans, status, commits, logs, prompt changes, tool calls, skill-writing actions, or implementation progress.
- URLs, issue counts, branch names, commit hashes, temporary file names, logs, raw command output, or package-lock details unless explicitly stable reference material.
- Tool availability checks, test messages, or meta-discussion about memory/deriver quality.
- Facts that are only true inside the current session or depend on temporary state.

30-DAY TEST:
Before writing an observation, ask: Would knowing this materially help in a fresh conversation a month from now? If not, output nothing for it.

The key conceptual change is this:

Old: Extract all explicit atomic facts.
New: Extract only durable, future-useful observations.

That one change matters a lot.

5. Add a Deterministic Post-Derivation Filter

Prompting alone was not enough. The LLM still occasionally emitted transcript junk, so we added a conservative deterministic filter after derivation.

File changed

src/deriver/deriver.py

Where to add the filter function

Add is_durable_observation(...) in src/deriver/deriver.py, near strip_memory_context(...), above process_representation_tasks_batch(...).

In our patch, this lives near the top of the file after:

logger = logging.getLogger(__name__)

and before:

def _get_deriver_model_config() -> ConfiguredModelSettings:
    return settings.DERIVER.MODEL_CONFIG

Where to apply the filter

Apply it inside process_representation_tasks_batch(...) in src/deriver/deriver.py, after the LLM has produced observations and after assign_observation_metadata(...) has run, but before this empty-check/store path:

if observations.is_empty() or not message_ids:
    logger.warning(...)
    return None

In other words, the final flow should be:

format messages
call minimal deriver / LLM
assign observation metadata
filter observations.explicit with is_durable_observation(...)
if empty, return
store observations

Code

Example shape:

_TRANSIENT_OBSERVATION_PATTERNS = (
    " asked ",
    " asks ",
    " asked whether ",
    " asked to ",
    " requested ",
    " wants ",
    " wanted ",
    " plans ",
    " is testing ",
    " shared ",
    " responded ",
    " will come back",
    " test email",
    " test message",
    " review the conversation",
    " conversation above",
    " queue",
    " deriver",
    " prompt",
    " backlog",
    " stashed",
    " restored",
    " ran command",
    " updated file",
    " pr-ready",
    " commit",
    " logs",
)

_HARD_REJECT_OBSERVATION_PATTERNS = (
    "review the conversation",
    "conversation above",
    "test email",
    "test message",
    "asked whether",
    "asked to stash",
    "queue is draining",
    "prompts.py patched",
)

_DURABLE_OBSERVATION_MARKERS = (
    " prefers ",
    " uses ",
    " runs ",
    " owns ",
    " has ",
    " is configured ",
    " is using ",
    " email address",
    " domain",
    " homelab",
    " active directory",
    " recurring",
    " default",
    " should be assumed",
    " does not want",
    " wants the assistant",
    " prefers the assistant",
)

_ASSISTANT_DURABLE_MARKERS = (
    "wants the assistant",
    "prefers the assistant",
    "does not want the assistant",
    "hermes setup",
    "hermes uses",
    "hermes runs",
    "hermes is configured",
)


def is_durable_observation(content: str, observed: str) -> bool:
    """Return whether an observation is durable enough to persist.

    The LLM prompt already asks for durable observations, but the deriver can
    still emit task-progress facts. Keep this deterministic gate conservative:
    false negatives are less harmful than polluting long-term memory with
    transient transcript trivia.
    """
    normalized = f" {content.lower().strip()} "

    if not normalized.strip():
        return False

    if any(pattern in normalized for pattern in _HARD_REJECT_OBSERVATION_PATTERNS):
        return False

    observed_lower = observed.lower().strip()
    if observed_lower in {"hermes", "assistant", "ai", "agent"}:
        return any(marker in normalized for marker in _ASSISTANT_DURABLE_MARKERS)

    has_transient_pattern = any(
        pattern in normalized for pattern in _TRANSIENT_OBSERVATION_PATTERNS
    )
    has_durable_marker = any(
        marker in normalized for marker in _DURABLE_OBSERVATION_MARKERS
    )

    if has_transient_pattern and not has_durable_marker:
        return False

    if (" assistant" in normalized or " hermes" in normalized) and not any(
        marker in normalized for marker in _ASSISTANT_DURABLE_MARKERS
    ):
        return False

    return True

Then, inside process_representation_tasks_batch(...), add:

observations.explicit = [
    obs
    for obs in observations.explicit
    if is_durable_observation(obs.content, observed)
]

Place that immediately after assign_observation_metadata(...):

assign_observation_metadata(
    observations,
    observed,
    latest_message.session_name,
    latest_message.created_at,
)

observations.explicit = [
    obs
    for obs in observations.explicit
    if is_durable_observation(obs.content, observed)
]

if observations.is_empty() or not message_ids:
    ...

This is intentionally conservative. Missing a marginal fact is much better than permanently storing junk and having it injected into every future conversation.

6. Be Extra Strict for Assistant/Agent Peers

If the observed peer is the assistant itself, almost everything should be rejected.

Useful assistant-peer observations are rare. They should generally be stable deployment facts, such as:

Hermes is configured to use Honcho as its memory provider.

Do not store:

Hermes updated a file.
Hermes ran tests.
Hermes created a skill.
Hermes plans to open a PR.
Hermes found a log message.

Those are not identity or memory. They are build artifacts wearing a fake mustache.

The assistant-peer strictness is implemented in the same is_durable_observation(...) function in:

src/deriver/deriver.py

Specifically this branch:

observed_lower = observed.lower().strip()
if observed_lower in {"hermes", "assistant", "ai", "agent"}:
    return any(marker in normalized for marker in _ASSISTANT_DURABLE_MARKERS)

7. Improve the Representation Schema Descriptions

The schema descriptions for explicit observations were also tightened.

File changed

src/utils/representation.py

Where to change it

Update the description= text for the explicit fields in:

PromptRepresentation.explicit
Representation.explicit

Both are in src/utils/representation.py.

Instead of describing explicit observations as literal facts or clear paraphrases, describe them as durable, future-useful observations and explicitly exclude:

one-off requests
questions
task progress
assistant actions
logs
tests
temporary troubleshooting state

Example replacement description:

explicit: list[ExplicitObservationBase] = Field(
    description=(
        "Durable, future-useful observations about the observed peer only. "
        "Exclude one-off requests, questions, task progress, assistant actions, "
        "logs, tests, and temporary troubleshooting state. Prefer [] over "
        "low-value observations."
    ),
    default_factory=list,
)

This helps because tool/function schemas are part of what the model sees. If the schema says “extract facts literally stated by the user,” the model will do that. If the schema says “durable observations only,” the model has a much better target.

What “Good” Looks Like

Good durable observations:

The user prefers concise, direct answers.
The user uses Fastmail for email and iCloud for calendar.
The user wants confirmation before file edits.
The user's homelab Active Directory domain is home.example.com.
The user stores secrets in .env files rather than hardcoding them in config files.

Bad observations:

The user asked to check Honcho status.
The user asked whether the queue was draining.
The assistant updated prompts.py.
The user sent a test email.
The user wants to review this conversation.
The deriver backlog has 449 pending tasks.
The assistant said the fix is PR-ready.

The good observations change future behavior. The bad ones merely summarize what happened.

Implementation Checklist

If you want to implement the same changes, the main files are:

src/deriver/deriver.py
src/deriver/prompts.py
src/utils/representation.py

Optional but recommended test coverage can go somewhere like:

tests/utils/test_observation_quality.py

Include regression tests for at least:

stripping tagged injected memory context: <memory-context>...</memory-context>
stripping raw compaction handoffs: [CONTEXT COMPACTION — REFERENCE ONLY]...
rejecting transient observations like “asked to…”, “queue is draining”, “updated file”, and test-message/task-progress facts
retaining durable observations like stable preferences, recurring tools, long-lived configuration, domains, addresses, and homelab conventions

Minimum implementation steps:

In src/deriver/deriver.py, import re.
In src/deriver/deriver.py, add strip_memory_context(...) and call it while building formatted_messages inside process_representation_tasks_batch(...). It should strip both <memory-context>...</memory-context> and raw [CONTEXT COMPACTION — REFERENCE ONLY]... blocks.
In src/deriver/prompts.py, rewrite minimal_deriver_prompt(...) to ask for durable 30-day memory instead of all explicit facts.
In src/deriver/deriver.py, add is_durable_observation(...) near the top of the file.
In src/deriver/deriver.py, apply the filter to observations.explicit inside process_representation_tasks_batch(...), after assign_observation_metadata(...) and before observations.is_empty() is checked.
In src/utils/representation.py, update PromptRepresentation.explicit and Representation.explicit descriptions to match the durable-memory behavior.
Set DERIVER_FLUSH_ENABLED=true for low-volume personal-agent workloads.
Recreate containers with docker compose up --detach --force-recreate api deriver.

Operational Checklist

If Honcho is running but producing poor memory, check this sequence:

Is the API reachable?
```
curl http://127.0.0.1:8000/docs
```
Are the containers running?
```
docker ps | grep honcho
```
Are model and embedding providers configured with real keys?

Check API and deriver logs. Provider errors often show up there before they show up clearly in the client.

Did you recreate containers after editing .env?

docker compose up --detach --force-recreate api deriver

Is deriver flushing enabled for low-volume personal-agent sessions?
```
DERIVER_FLUSH_ENABLED=true
```
Is recalled memory stripped before derivation?

Do not let <memory-context>...</memory-context> or raw [CONTEXT COMPACTION — REFERENCE ONLY]... summaries become fresh memory.
Does the deriver prompt ask for durable memory, not all facts?

“All explicit facts” is too broad for agent transcripts.
Is there a deterministic quality filter after the LLM?

Use a conservative reject filter for transient/task-progress observations in src/deriver/deriver.py.
Is the peer card actually populated?

Inspect peers.internal_metadata or call the peer-card endpoint directly. Search/context working does not prove the profile/card layer is populated.

The Main Lesson

Honcho worked much better once we stopped treating every transcript fact as memory.

The useful mental model is:

Conversation transcript != long-term memory
Tool output != user preference
Recalled memory != new evidence
Assistant task history != identity
Temporary troubleshooting state != durable fact

A memory system should be lossy. It should forget aggressively. The goal is not to preserve the conversation; the goal is to preserve the small set of facts that will make future conversations better.

Once the deriver was made conservative, injected memory was stripped, and small work units were flushed, Honcho became dramatically more useful.

brav0charlie/getting-useful-results-from-self-hosted-honcho.md

Select an option

No results found

Select an option

No results found

Getting Useful Results from Self-Hosted Honcho

Symptoms

1. Profile / peer-card stayed empty

2. Search/context worked, but profile did not

3. The deriver queue was not draining reliably

4. Recalled memory was being re-derived as new memory

5. The deriver prompt was too permissive

The Working Shape

1. Enable Deriver Flushing

2. Route Models and Embeddings Correctly

3. Strip Injected Memory Before Derivation

File changed

Where to add it

Code

4. Replace “Extract All Facts” with “Extract Durable Memory”

File changed

Where to change it

Prompt shape

5. Add a Deterministic Post-Derivation Filter

File changed

Where to add the filter function

Where to apply the filter

Code

6. Be Extra Strict for Assistant/Agent Peers

7. Improve the Representation Schema Descriptions

File changed

Where to change it

What “Good” Looks Like

Implementation Checklist

Operational Checklist

The Main Lesson