Skip to content

Instantly share code, notes, and snippets.

@p-i-
Last active October 2, 2025 00:35
Show Gist options
  • Save p-i-/96147d0108476a83c69a339d5856d42e to your computer and use it in GitHub Desktop.
Save p-i-/96147d0108476a83c69a339d5856d42e to your computer and use it in GitHub Desktop.

Anthropic API Bug Report: Cache Control on Tools Array Ignored

Summary

The Anthropic API ignores cache_control markers placed on tools in the tools array, preventing efficient caching of system prompts and tool definitions across requests.

Expected Behavior

When placing a cache_control marker on a tool definition:

{
  "tools": [
    {
      "name": "execute_bash_code",
      "description": "...",
      "input_schema": {...},
      "cache_control": {
        "type": "ephemeral"
      }
    }
  ]
}

The API should cache all content up to and including this tool (system prompt + tools), allowing subsequent requests with identical system/tools to read from this cache.

Actual Behavior

The cache_control marker on tools[0] is completely ignored. Each request creates a new cache instead of reusing the existing one.

Reproduction

Test Environment

  • Model: claude-sonnet-4-5-20250929
  • API Version: anthropic-version: 2023-06-01
  • Tested with and without: anthropic-beta: prompt-caching-2024-07-31 header (no difference)

Request 1 (Cold Start)

{
  "model": "claude-sonnet-4-5-20250929",
  "max_tokens": 1024,
  "system": [
    {
      "type": "text",
      "text": "# System Prompt\n\n[~1000 token system prompt]"
    }
  ],
  "tools": [
    {
      "name": "execute_bash_code",
      "description": "Execute bash code...",
      "input_schema": {
        "type": "object",
        "properties": {
          "code": {
            "type": "string",
            "description": "The bash code to execute"
          }
        },
        "required": ["code"]
      },
      "cache_control": {
        "type": "ephemeral"
      }
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "system_report: terminals=[], cycle=1",
          "cache_control": {
            "type": "ephemeral"
          }
        },
        {
          "type": "text",
          "text": "calculate sqrt(42)"
        }
      ]
    }
  ]
}

Response 1 Usage:

{
  "input_tokens": 9,
  "cache_creation_input_tokens": 1072,
  "cache_read_input_tokens": 0
}

Request 2 (Should Reuse Cache)

Same system prompt, same tools, different message content:

{
  "model": "claude-sonnet-4-5-20250929",
  "max_tokens": 1024,
  "system": [...],  // IDENTICAL to Request 1
  "tools": [...],   // IDENTICAL to Request 1, including cache_control
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "calculate sqrt(42)"
        },
        {
          "type": "text",
          "text": "assistant_response: {type: tool_use, ...}"
        },
        {
          "type": "text",
          "text": "system_report: terminals=[...], cycle=2",
          "cache_control": {
            "type": "ephemeral"
          }
        }
      ]
    }
  ]
}

Response 2 Usage (BROKEN):

{
  "input_tokens": 4,
  "cache_creation_input_tokens": 1131,  // ❌ Creating NEW cache
  "cache_read_input_tokens": 0           // ❌ Not reading from existing cache
}

Expected Response 2 Usage:

{
  "input_tokens": 4,
  "cache_creation_input_tokens": 60,    // Only new message content
  "cache_read_input_tokens": 1072       // Read system + tools from cache
}

Workaround

Place a stable cache_control marker on the first message content block instead of on the tools array:

Request 1 (Workaround)

{
  "tools": [
    {
      "name": "execute_bash_code",
      // NO cache_control here
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Priming cache",  // Stable first block
          "cache_control": {
            "type": "ephemeral"
          }
        },
        {
          "type": "text",
          "text": "system_report: terminals=[], cycle=1"
        },
        {
          "type": "text",
          "text": "calculate sqrt(42)"
        }
      ]
    }
  ]
}

Request 2 (Workaround)

{
  "tools": [...],  // Same tools, no cache_control
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Priming cache",  // SAME stable first block
          "cache_control": {
            "type": "ephemeral"
          }
        },
        {
          "type": "text",
          "text": "calculate sqrt(42)"
        },
        // ... rest of message content
      ]
    }
  ]
}

Response 2 Usage (WORKING):

{
  "input_tokens": 7,
  "cache_creation_input_tokens": 69,    // ✅ Only new content
  "cache_read_input_tokens": 1066       // ✅ Reading system + tools
}

Impact

This bug forces applications to:

  1. Add artificial "cache priming" content blocks to messages
  2. Maintain these stable blocks across all requests
  3. Pollute the conversation context with non-semantic content

Without the workaround, applications using tools cannot efficiently cache their system prompts and tool definitions, resulting in:

  • 10x higher API costs for cached content that should be reused
  • Slower response times due to unnecessary cache creation on every request

Test Files

Complete reproduction test suite available at:

  • test.sh - Demonstrates the broken behavior with tools[0].cache_control
  • test_fix.sh - Demonstrates the workaround with message content cache points
  • tools.json - Tool definition with cache_control (ignored)
  • tools_fix.json - Tool definition without cache_control (for workaround)
  • system.md - System prompt (~1000 tokens)

Additional Notes

  • The presence or absence of the anthropic-beta: prompt-caching-2024-07-31 header makes no difference to this behavior
  • Cache points on message content blocks work correctly
  • The bug specifically affects cache points placed on elements of the tools array
  • Total request size exceeds the 1024 token minimum for cache activation

System Prompt

You are a helpful AI assistant that can execute bash code to help users solve problems.

When asked to perform calculations or tasks, use the execute_bash_code tool to run the appropriate commands.

Be concise and accurate in your responses.

Your Capabilities

You have access to a bash execution environment where you can run commands to:

  • Perform mathematical calculations using tools like bc, awk, or python
  • Query system information using commands like uname, df, ps, and top
  • Process text files using grep, sed, awk, cut, and other text processing utilities
  • Manipulate files and directories with standard Unix commands
  • Execute scripts and programs available in the environment
  • Chain commands together using pipes and redirection
  • Use control flow with conditionals and loops
  • Access environment variables and system resources

Best Practices

When writing bash code:

  1. Always validate inputs before processing them
  2. Use appropriate error handling with set -e or explicit checks
  3. Quote variables properly to handle spaces and special characters
  4. Prefer built-in commands over external programs when possible
  5. Use meaningful variable names for clarity
  6. Add comments to explain complex logic
  7. Test commands in isolation before combining them
  8. Consider edge cases and error conditions
  9. Use appropriate tools for the task (bc for math, awk for text, etc.)
  10. Keep code readable and maintainable

Example Usage Patterns

For calculations:

  • Simple arithmetic: echo $((2 + 2))
  • Floating point: echo "scale=4; 22/7" | bc
  • Advanced math: python3 -c "import math; print(math.sqrt(42))"

For text processing:

  • Pattern matching: grep -E "pattern" file.txt
  • Field extraction: awk '{print $1, $3}' data.txt
  • String replacement: sed 's/old/new/g' file.txt

For system operations:

  • Process listing: ps aux | grep process_name
  • Disk usage: df -h
  • Memory info: free -h
  • Network status: netstat -tuln

Remember to always provide clear output and explain what the code does when presenting results to the user.

#!/bin/bash
set -e
# Load API key
. ../../.env
SYSTEM_PROMPT=$(cat system.md)
TOOLS=$(cat tools.json)
echo "========================================="
echo "REQUEST 1: Cold start with system_report"
echo "========================================="
# Build request using jq to handle JSON escaping
REQUEST1=$(jq -n \
--arg system "$SYSTEM_PROMPT" \
--argjson tools "$TOOLS" \
'{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"system": [
{
"type": "text",
"text": $system
}
],
"tools": $tools,
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "system_report: terminals=[], cycle=1",
"cache_control": {
"type": "ephemeral"
}
},
{
"type": "text",
"text": "calculate sqrt(42)"
}
]
}
]
}')
echo "$REQUEST1" | jq . > request1.json
echo "Request saved to request1.json"
curl -s https://api.anthropic.com/v1/messages \
-H "content-type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: prompt-caching-2024-07-31" \
-d "$REQUEST1" | jq . > response1.json
echo "Response saved to response1.json"
echo ""
echo "Usage from response1:"
jq '.usage' response1.json
echo ""
sleep 2
echo "========================================="
echo "REQUEST 2: Second request (simulating cycle 2)"
echo "========================================="
REQUEST2=$(jq -n \
--arg system "$SYSTEM_PROMPT" \
--argjson tools "$TOOLS" \
'{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"system": [
{
"type": "text",
"text": $system
}
],
"tools": $tools,
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "calculate sqrt(42)"
},
{
"type": "text",
"text": "assistant_response: {type: tool_use, name: execute_bash_code, input: {code: \"echo \\\"scale=2; sqrt(42)\\\" | bc\"}}"
},
{
"type": "text",
"text": "system_report: terminals=[{pid: 1234, sticky_note: \"calculator\"}], cycle=2",
"cache_control": {
"type": "ephemeral"
}
}
]
}
]
}')
echo "$REQUEST2" | jq . > request2.json
echo "Request saved to request2.json"
curl -s https://api.anthropic.com/v1/messages \
-H "content-type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: prompt-caching-2024-07-31" \
-d "$REQUEST2" | jq . > response2.json
echo "Response saved to response2.json"
echo ""
echo "Usage from response2:"
jq '.usage' response2.json
echo ""
echo "========================================="
echo "ANALYSIS"
echo "========================================="
echo "Request 1 - Cache creation (expected):"
jq '.usage.cache_creation_input_tokens // 0' response1.json
echo ""
echo "Request 1 - Cache read (expected 0):"
jq '.usage.cache_read_input_tokens // 0' response1.json
echo ""
echo "Request 2 - Cache creation (should be small):"
jq '.usage.cache_creation_input_tokens // 0' response2.json
echo ""
echo "Request 2 - Cache read (should be ~same as req1 cache_creation):"
jq '.usage.cache_read_input_tokens // 0' response2.json
echo ""
CACHE_CREATE_1=$(jq '.usage.cache_creation_input_tokens // 0' response1.json)
CACHE_READ_2=$(jq '.usage.cache_read_input_tokens // 0' response2.json)
if [ "$CACHE_READ_2" -gt 0 ]; then
echo "✅ SUCCESS: Request 2 read $CACHE_READ_2 tokens from cache"
else
echo "❌ FAILURE: Request 2 did not read from cache (cache_read_input_tokens = $CACHE_READ_2)"
fi
#!/bin/bash
set -e
# Load API key
. ../../.env
SYSTEM_PROMPT=$(cat system.md)
TOOLS=$(cat tools_fix.json)
echo "========================================="
echo "REQUEST 1: Cold start with system_report"
echo "========================================="
# Build request using jq to handle JSON escaping
REQUEST1=$(jq -n \
--arg system "$SYSTEM_PROMPT" \
--argjson tools "$TOOLS" \
'{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"system": [
{
"type": "text",
"text": $system
}
],
"tools": $tools,
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Priming cache",
"cache_control": {
"type": "ephemeral"
}
},
{
"type": "text",
"text": "system_report: terminals=[], cycle=1"
},
{
"type": "text",
"text": "calculate sqrt(42)"
}
]
}
]
}')
echo "$REQUEST1" | jq . > request1.json
echo "Request saved to request1.json"
curl -s https://api.anthropic.com/v1/messages \
-H "content-type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d "$REQUEST1" | jq . > response1.json
echo "Response saved to response1.json"
echo ""
echo "Usage from response1:"
jq '.usage' response1.json
echo ""
sleep 2
echo "========================================="
echo "REQUEST 2: Second request (simulating cycle 2)"
echo "========================================="
REQUEST2=$(jq -n \
--arg system "$SYSTEM_PROMPT" \
--argjson tools "$TOOLS" \
'{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"system": [
{
"type": "text",
"text": $system
}
],
"tools": $tools,
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Priming cache",
"cache_control": {
"type": "ephemeral"
}
},
{
"type": "text",
"text": "calculate sqrt(42)"
},
{
"type": "text",
"text": "assistant_response: {type: tool_use, name: execute_bash_code, input: {code: \"echo \\\"scale=2; sqrt(42)\\\" | bc\"}}"
},
{
"type": "text",
"text": "system_report: terminals=[{pid: 1234, sticky_note: \"calculator\"}], cycle=2",
"cache_control": {
"type": "ephemeral"
}
}
]
}
]
}')
echo "$REQUEST2" | jq . > request2.json
echo "Request saved to request2.json"
curl -s https://api.anthropic.com/v1/messages \
-H "content-type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d "$REQUEST2" | jq . > response2.json
echo "Response saved to response2.json"
echo ""
echo "Usage from response2:"
jq '.usage' response2.json
echo ""
echo "========================================="
echo "ANALYSIS"
echo "========================================="
echo "Request 1 - Cache creation (expected):"
jq '.usage.cache_creation_input_tokens // 0' response1.json
echo ""
echo "Request 1 - Cache read (expected 0):"
jq '.usage.cache_read_input_tokens // 0' response1.json
echo ""
echo "Request 2 - Cache creation (should be small):"
jq '.usage.cache_creation_input_tokens // 0' response2.json
echo ""
echo "Request 2 - Cache read (should be ~same as req1 cache_creation):"
jq '.usage.cache_read_input_tokens // 0' response2.json
echo ""
CACHE_CREATE_1=$(jq '.usage.cache_creation_input_tokens // 0' response1.json)
CACHE_READ_2=$(jq '.usage.cache_read_input_tokens // 0' response2.json)
if [ "$CACHE_READ_2" -gt 0 ]; then
echo "✅ SUCCESS: Request 2 read $CACHE_READ_2 tokens from cache"
else
echo "❌ FAILURE: Request 2 did not read from cache (cache_read_input_tokens = $CACHE_READ_2)"
fi
[
{
"name": "execute_bash_code",
"description": "Execute bash code to perform calculations or system operations",
"input_schema": {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "The bash code to execute"
}
},
"required": ["code"]
},
"cache_control": {
"type": "ephemeral"
}
}
]
[
{
"name": "execute_bash_code",
"description": "Execute bash code to perform calculations or system operations",
"input_schema": {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "The bash code to execute"
}
},
"required": ["code"]
}
}
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment