The Anthropic API ignores cache_control
markers placed on tools in the tools
array, preventing efficient caching of system prompts and tool definitions across requests.
When placing a cache_control
marker on a tool definition:
{
"tools": [
{
"name": "execute_bash_code",
"description": "...",
"input_schema": {...},
"cache_control": {
"type": "ephemeral"
}
}
]
}
The API should cache all content up to and including this tool (system prompt + tools), allowing subsequent requests with identical system/tools to read from this cache.
The cache_control
marker on tools[0]
is completely ignored. Each request creates a new cache instead of reusing the existing one.
- Model:
claude-sonnet-4-5-20250929
- API Version:
anthropic-version: 2023-06-01
- Tested with and without:
anthropic-beta: prompt-caching-2024-07-31
header (no difference)
{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"system": [
{
"type": "text",
"text": "# System Prompt\n\n[~1000 token system prompt]"
}
],
"tools": [
{
"name": "execute_bash_code",
"description": "Execute bash code...",
"input_schema": {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "The bash code to execute"
}
},
"required": ["code"]
},
"cache_control": {
"type": "ephemeral"
}
}
],
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "system_report: terminals=[], cycle=1",
"cache_control": {
"type": "ephemeral"
}
},
{
"type": "text",
"text": "calculate sqrt(42)"
}
]
}
]
}
Response 1 Usage:
{
"input_tokens": 9,
"cache_creation_input_tokens": 1072,
"cache_read_input_tokens": 0
}
Same system prompt, same tools, different message content:
{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"system": [...], // IDENTICAL to Request 1
"tools": [...], // IDENTICAL to Request 1, including cache_control
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "calculate sqrt(42)"
},
{
"type": "text",
"text": "assistant_response: {type: tool_use, ...}"
},
{
"type": "text",
"text": "system_report: terminals=[...], cycle=2",
"cache_control": {
"type": "ephemeral"
}
}
]
}
]
}
Response 2 Usage (BROKEN):
{
"input_tokens": 4,
"cache_creation_input_tokens": 1131, // ❌ Creating NEW cache
"cache_read_input_tokens": 0 // ❌ Not reading from existing cache
}
Expected Response 2 Usage:
{
"input_tokens": 4,
"cache_creation_input_tokens": 60, // Only new message content
"cache_read_input_tokens": 1072 // Read system + tools from cache
}
Place a stable cache_control
marker on the first message content block instead of on the tools array:
{
"tools": [
{
"name": "execute_bash_code",
// NO cache_control here
}
],
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Priming cache", // Stable first block
"cache_control": {
"type": "ephemeral"
}
},
{
"type": "text",
"text": "system_report: terminals=[], cycle=1"
},
{
"type": "text",
"text": "calculate sqrt(42)"
}
]
}
]
}
{
"tools": [...], // Same tools, no cache_control
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Priming cache", // SAME stable first block
"cache_control": {
"type": "ephemeral"
}
},
{
"type": "text",
"text": "calculate sqrt(42)"
},
// ... rest of message content
]
}
]
}
Response 2 Usage (WORKING):
{
"input_tokens": 7,
"cache_creation_input_tokens": 69, // ✅ Only new content
"cache_read_input_tokens": 1066 // ✅ Reading system + tools
}
This bug forces applications to:
- Add artificial "cache priming" content blocks to messages
- Maintain these stable blocks across all requests
- Pollute the conversation context with non-semantic content
Without the workaround, applications using tools cannot efficiently cache their system prompts and tool definitions, resulting in:
- 10x higher API costs for cached content that should be reused
- Slower response times due to unnecessary cache creation on every request
Complete reproduction test suite available at:
test.sh
- Demonstrates the broken behavior withtools[0].cache_control
test_fix.sh
- Demonstrates the workaround with message content cache pointstools.json
- Tool definition with cache_control (ignored)tools_fix.json
- Tool definition without cache_control (for workaround)system.md
- System prompt (~1000 tokens)
- The presence or absence of the
anthropic-beta: prompt-caching-2024-07-31
header makes no difference to this behavior - Cache points on message content blocks work correctly
- The bug specifically affects cache points placed on elements of the
tools
array - Total request size exceeds the 1024 token minimum for cache activation