LLMs are stateless and non-deterministic. Every decision an agent makes is determined entirely by the tokens currently in its context window. The implication is stark: better tokens in, better tokens out. And critically, more tokens does not mean better outcomes. DEV Community
Performance degrades beyond roughly 40% context utilization. As context usage grows, model quality degrades, often beginning around 40% of the window depending on task complexity. Common culprits: large tool outputs (JSON blobs, UUIDs, logs), unfiltered file dumps, long correction-loop histories, and MCPs dumping irrelevant data. Once you're in the dumb zone, model quality doesn't matter. Alexlavaee | DEV Community
If a conversation's pattern is mistake, correction, mistake, correction, the most likely continuation is another mistake. Bad trajectories reinforce failure modes. This is why restarting sessions or compressing context often beats continued correction. DEV Community
This is the core practice. The philosophy is to spend as much time as possible on intentional context management. Instead of dragging an ever-growing conversation forward, you summarize the current state into a compact markdown artifact, validate it as a human, and start a fresh context seeded with that artifact. You compact relevant files, verified architectural behavior, decisions made, and constraints. You strip raw logs, tool traces, and repetitive error chains. Compaction converts exploration into a one-time cost instead of a recurring tax. Medium | LinearB