Skip to content

Instantly share code, notes, and snippets.

@csellis
Created September 14, 2025 05:52
Show Gist options
  • Select an option

  • Save csellis/93a242dc04fe72169d2bdc046c50641a to your computer and use it in GitHub Desktop.

Select an option

Save csellis/93a242dc04fe72169d2bdc046c50641a to your computer and use it in GitHub Desktop.

Context Engineering for Agents - Team Insights & Resources

  • Video Title: Context Engineering for Agents - Lance Martin, LangChain
  • Channel: Latent Space
  • URL: https://youtu.be/_IlTcWciEC4
  • Duration: ~70 minutes
  • Date Transcribed: 2025-09-14
  • Transcription Method: YouTube Captions

IMAGE ALT TEXT

Speakers

  • Alessio: Co-host, founder of Kernel Labs
  • Swyx: Co-host, founder of Small AI
  • Lance Martin: Senior staff at LangChain, creator of open deep research agent

Key Topics

  • Context engineering fundamentals and definitions
  • Multi-agent architecture trade-offs
  • Retrieval strategies for coding agents
  • Context management techniques (offloading, compression, isolation)
  • The bitter lesson applied to AI engineering
  • Memory vs context engineering distinctions

Summary

Lance Martin discusses context engineering - the challenge of feeding language models the right context at each step in agent workflows. He covers five key strategies: offloading context to external storage, retrieving relevant information, reducing/compressing context, isolating context across sub-agents, and caching. The conversation explores trade-offs between structured vs. agentic approaches, emphasizing how rapidly improving model capabilities require constantly reassessing architectural assumptions.

Extracted from Latent Space Podcast with Lance Martin (LangChain) - September 2025

Executive Summary

This document synthesizes critical insights from Lance Martin's discussion on context engineering - a discipline that's becoming essential for building production-quality AI agents. The strategies outlined here directly impact agent performance, cost efficiency, and reliability at scale.

Core Concept: Context Engineering

Definition: "The challenge of feeding a language model just the right context for the next step" - particularly critical for agents making hundreds of tool calls.

The Problem: Naive agent implementations accumulate massive context (500K+ tokens, $1-2 per run) from tool calls, hitting context limits and suffering performance degradation from context rot.

Origin: Term coined by Andrej Karpathy, popularized as teams faced shared challenges building agents in 2024.

Five Essential Context Management Strategies

1. Offloading Context

  • Don't: Naively pass full tool call results back to the model
  • Do: Write results to disk/external storage, return summaries or URLs
  • Implementation: Use file system or agent state as externalized memory
  • Key insight: Compress but preserve high-recall summaries for retrieval decisions
  • Example: Manus suggests using file system as externalized memory rather than pushing raw context to model

2. Retrieval Strategies

Two competing approaches emerge:

Complex Approach (Windsurf):

  • Code chunking along semantic boundaries
  • Embeddings and vector search
  • Knowledge graphs
  • Re-ranking pipelines

Simple Approach (Claude Code):

  • No indexing whatsoever
  • Agentic search with basic file tools (grep, find)
  • Just-in-time context retrieval

Winner: Simple agentic search often outperforms complex RAG pipelines

Practical Implementation:

  • Use llm.txt files with LLM-generated descriptions of docs/code
  • Let agents decide what context to retrieve based on task
  • Benchmark showed simple approach with good descriptions extremely effective

3. Context Reduction/Compression

  • When: At tool call boundaries, near context limits (Claude Code's 95% warning)
  • Caution: Irreversible pruning is risky - always preserve raw context externally
  • Quality matters: 2x performance difference between automated vs curated compression
  • Cognition approach: Fine-tuned models for summarization in coding contexts
  • Best practice: Offload first, then compress - never lose the original context

4. Context Isolation (Multi-Agent)

Effective Use Cases:

  • Parallel read-only tasks (research, data gathering)
  • Context collection with single-agent synthesis
  • Clear separation of concerns

Problematic Use Cases:

  • Sub-agents making conflicting write decisions
  • Highly coordinated tasks (especially coding)
  • Complex state dependencies between agents

Key Insight: Multi-agent works for "read" tasks, single agent for "write" tasks

Cognition's Counter-Argument: Avoid multi-agent due to:

  • Difficulty communicating sufficient context between agents
  • Implicit conflicting decisions
  • Coordination complexity

5. Caching

  • Benefit: Reduces latency and cost significantly
  • Current State: Most providers now auto-cache (OpenAI, Anthropic, Gemini)
  • Important Limitation: Doesn't solve context rot - long context still degrades performance
  • Implementation: Anthropic's explicit headers now mostly automatic

The Bitter Lesson Applied to AI Engineering

Core Principle

As models improve exponentially, remove structural constraints that worked at lower capability levels. More general approaches with less hardcoded structure ultimately win.

Lance's Real-World Experience

Evolution of Open Deep Research Agent:

  1. Early 2024: Highly structured workflow

    • No tool calling ("everyone knows it's unreliable")
    • Hardcoded research sections
    • Parallel writing per section
  2. Mid 2024: Adapted to improving models

    • Added tool calling as reliability improved
    • Agent-based approach
    • Still structured sections
  3. Late 2024: Embraced generality

    • Removed sub-agent writing (communication issues)
    • Single synthesis step
    • Full agentic approach

Result: Now best-performing open-source deep research agent

Strategic Implications

For Product Development:

  • Build products that "don't quite work yet" - model improvements will unlock them
  • Example: Cursor didn't work well until Claude 3.5 Sonnet hit
  • Timing the capability curve is crucial

For Architecture:

  • Add minimum structure needed for current model capabilities
  • Plan to remove constraints as models improve
  • Avoid over-engineering for current limitations

Multi-Agent vs Single-Agent Trade-offs

Research Tasks (Multi-Agent Works)

  • Anthropic's approach: Parallel sub-agents for research collection
  • Final step: Single agent for report writing
  • Why it works: Clear read/write separation, no conflicting decisions

Coding Tasks (Single-Agent Preferred)

  • Cognition's approach: Linear agent, careful summarization at boundaries
  • Challenge: Sub-agents making implicit architectural decisions
  • Conflicts: Compilation issues when merging sub-agent outputs
  • Exception: Claude Code allows sub-agents (but remains controversial)

Memory vs Context Engineering

Memory Dimensions

  • Writing: When does system capture memories? (Manual vs Automatic)
  • Reading: How does system retrieve memories? (Manual vs Automatic)

Approaches Compared

Claude Code (Simple):

  • Reading: Automatic (pulls in all CLAUDE.md files)
  • Writing: Manual (user explicitly saves to memory)
  • Philosophy: Very simple, user-controlled

ChatGPT (Complex):

  • Reading: Automatic (AI-driven retrieval)
  • Writing: Automatic (AI decides what to remember)
  • Risk: Memory retrieval gone wrong (Simon Willison's location example)

Best Practice: Human-in-the-Loop

  • Sweet spot: Memory updates from user corrections
  • Example: Email assistant learns tone preferences from user edits
  • Implementation: LLM reflects on corrections, updates instructions
  • Benefit: System improves over time through natural feedback

Framework Philosophy: Low-Level vs High-Level

Anti-Framework Arguments

Valid concerns about high-level abstractions:

  • "from framework import agent" - don't know what's under the hood
  • Hard to modify when models improve
  • We're in the "HTML era" of agents (very early)

Pro-Framework Arguments

Value of low-level orchestration:

  • LangGraph approach: Nodes, edges, state - composable primitives
  • Enterprise need: Standardized tooling reduces cognitive load
  • Shopify example: Built "Roast" (essentially LangGraph) for internal consistency

MCP (Model Context Protocol) Benefits

  • Standardization: Prevents tool integration chaos
  • Prompts included: Server can specify how to use it effectively
  • Resources: Rich context beyond just tool calls
  • Sampling: Underrated capability for complex interactions

Practical Recommendations

Immediate Implementation

  1. Audit current agents for naive context accumulation patterns
  2. Implement offloading for token-heavy operations
  3. Try agentic search before building complex RAG systems
  4. Create llm.txt files for documentation and code understanding
  5. Use human-in-the-loop for memory and preference capture

Strategic Architecture Decisions

  1. Start simple - complex pipelines often lose to agentic approaches
  2. Plan for model improvements - avoid over-engineering current solutions
  3. Separate read/write tasks when considering multi-agent
  4. Preserve raw context when implementing compression
  5. Use low-level frameworks over high-level abstractions

Monitoring and Optimization

  • Track token usage across agent runs
  • Monitor context windows - implement alerts at 80-90%
  • Measure compression quality - test summarization effectiveness
  • Benchmark retrieval - simple vs complex approaches
  • A/B test caching strategies

Tools and Technologies

Recommended Stack

  • MCP servers for standardized tool integration
  • LangGraph for low-level orchestration (if framework needed)
  • llm.txt pattern for documentation integration
  • Provider caching (OpenAI, Anthropic, Gemini auto-cache)

Avoid

  • High-level agent abstractions that obscure implementation
  • Complex RAG pipelines without benchmarking against simple approaches
  • Irreversible context pruning without external backup

Business Impact for XtendOps

Cost Optimization

  • Proper context management: 500K tokens → manageable levels
  • Caching strategies: Significant latency and cost reduction
  • Smart offloading: Avoid repeated processing of same context

Performance Gains

  • Context engineering: Dramatically improves agent reliability
  • Right-sized context: Better model performance, fewer hallucinations
  • Strategic architecture: Systems that improve with model capabilities

Scalability Enablers

  • Production-ready agents: Handle hundreds of tool calls efficiently
  • Multi-client deployment: Standardized patterns across implementations
  • Maintenance reduction: Less complex systems are easier to debug

Application Areas

  1. Client Workflow Automation: Multi-step reasoning with proper context management
  2. Code Generation Tools: Development teams need reliable, fast context retrieval
  3. Business Intelligence: Research and synthesis require smart context strategies
  4. Customer Support: Memory and context management for ongoing conversations

Key Takeaways

  1. Context engineering is the new prompt engineering - essential for production agents
  2. Simple often beats complex - agentic search vs elaborate RAG pipelines
  3. Model improvements change everything - build general systems, remove constraints
  4. Multi-agent has clear use cases - good for parallel reads, bad for coordinated writes
  5. Human-in-the-loop enables learning - best way to improve agent performance over time

Resources and References

Core Context Engineering Resources

Context Failure Modes

Industry Perspectives

The Bitter Lesson Resources

Memory and Human-in-the-Loop

Framework Philosophy

Practical Implementations

Scaling Insights


This document serves as a strategic guide for implementing context engineering practices at XtendOps and related AI agent development efforts. Regular updates recommended as the field evolves rapidly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment