You're right — the current approach has a fundamental evolution problem.
The issue isn't just where feedback is stored. It's that a monolithic prompt is a black box — when it picks the wrong post, you don't know which step of the reasoning failed. Was it bad at judging niche fit? Bad at spotting engagement opportunity? Bad at matching your expertise? You can't tell, so you can't fix it precisely. Feedback goes into a general "try harder next time" pile that doesn't map to anything structural.
Why LangGraph is the right direction
LangGraph forces you to break the agent into explicit nodes with observable, structured outputs. Each node reads from shared state and writes back to it. This means every intermediate decision is logged — not just the final post pick.
For the Reddit hunter, the graph would look like:
load_preferences → fetch_posts → score_posts → select_best → draft_comment → quality_check → [INTERRUPT]
load_preferences — reads liked/disliked history from MongoDB before anything runs. The agent starts every run aware of past mistakes.
score_posts — instead of "LLM pick the best post," each post gets scored on explicit dimensions:
{
"post_id": "1ri4das",
"scores": {
"niche_fit": 0.9,
"engagement_opportunity": 0.8,
"expertise_match": 0.85,
"comment_potential": 0.7
},
"rationale": "Organic growth topic, direct overlap with ASO work..."
}These scores are stored in state. When you dislike a post, you're not just disliking a black box — you're disliking a post that had niche_fit: 0.9. That's a tunable bug.
quality_check — a separate node that re-evaluates the selected post + draft against stored preferences. Can conditionally loop back to select_best if the pick is too similar to past dislikes. This is the self-correction loop that a simple prompt can't do.
INTERRUPT — LangGraph natively supports pausing for human input. Instead of retroactive feedback, you get pre-publication approval. Every approve/reject becomes an immediate feedback signal, attached to a specific scoring state.
Where this actually enables evolution
-
Aggregate score errors — after 20 runs, you can query: "which posts had high scores but got disliked?" If
niche_fitis consistently wrong for SaaS posts, that's a specific dimension to fix in the score_posts prompt — not a vague "improve post selection." -
Node-level prompt tuning —
score_postsandquality_checkhave separate prompts. Feedback routes to the right node. Fixing niche detection doesn't risk breaking draft quality. -
Conditional graph evolution — add nodes without rewriting the agent. Want the agent to check post comment thread before picking? Add a
scan_commentsnode betweenfetch_postsandscore_posts. With a monolithic prompt, you'd have to redesign the whole thing. -
Preference drift detection — if your dislikes in month 3 have different patterns than month 1, you can detect that and re-weight scoring dimensions accordingly. A flat feedback list can't show you that.
Honest assessment: is it worth converting now?
The feedback system you just shipped is the right first step — it collects signal. But you're right that it won't evolve the agent well on its own. A LangGraph version would need:
- Python agent (current Reddit hunter is a bash+curl script — would need a full rewrite)
- LangGraph + LangChain dependencies
- Checkpointer (SQLite or Postgres for state persistence across runs)
- The interrupt pattern wired into the Telegram notification flow
That's a meaningful rebuild, not a quick addition. But the payoff is real: structured scores you can analyze, a quality-check loop that self-rejects bad picks, and feedback that maps to specific fixable dimensions rather than general vibes.
Recommended path: build the LangGraph version as a replacement runner — same inputs (subreddit, date), same output (MongoDB document), but internal structure is a proper graph. Keep the feedback UI you just built — it feeds Phase 2 either way.
Want me to draft a plan for the LangGraph Reddit hunter?