Skip to content

Instantly share code, notes, and snippets.

@gurdiga
Last active April 30, 2026 18:49
Show Gist options
  • Select an option

  • Save gurdiga/3ce759edce07d2ae60baa12d4b21edae to your computer and use it in GitHub Desktop.

Select an option

Save gurdiga/3ce759edce07d2ae60baa12d4b21edae to your computer and use it in GitHub Desktop.
Claude Code skill: profile individual tool-call durations via the built-in OpenTelemetry console exporter

claude-tool-profiling

A Claude Agent Skill that measures how long individual Claude Code tool calls take — Bash, Edit, MCP tools, everything.

It uses Claude Code's built-in OpenTelemetry instrumentation with the console exporter, so no Docker collector or observability backend is needed. Spans land in stdout, a small awk pipeline extracts tool.execution durations.

Why

When you're scripting lots of tiny tool calls, it helps to know the per-call floor. For example, the benchmark below shows that Bash "echo hello" has an overhead ~20× higher than an MCP stdio tool — which changes how you'd design a multi-step workflow.

Install

git clone https://gist.github.com/gurdiga/3ce759edce07d2ae60baa12d4b21edae.git \
  ~/.claude/skills/claude-tool-profiling

Claude Code picks up ~/.claude/skills/*/SKILL.md automatically on next launch.

Quick demo

bash ~/.claude/skills/claude-tool-profiling/run-experiment.sh \
  5 "Run echo hello via Bash, then say done" Bash

Sample output (Claude Code 2.1.123, macOS arm64):

=== run 1/5 ===
=== run 2/5 ===
=== run 3/5 ===
=== run 4/5 ===
=== run 5/5 ===

=== tool.execution durations (matching tool name substring: Bash) ===
run   duration_ms
---   -----------
1     2807
2     3826
3     3640
4     3609
5     3169

=== stats ===
n=5  min=2807ms  max=3826ms  avg=3410ms

Same experiment against mcp__chrome-devtools__take_screenshot (viewport PNG):

run   duration_ms
---   -----------
1     149
2     162
3     165
4     314
5     163

n=5  min=149ms  max=314ms  avg=190ms

Bash averages 3,410 ms per call; the MCP screenshot averages 190 ms. That ~18× gap is all overhead in the Bash tool path (subprocess spawn, hooks, output capture, IPC back to the CLI) — not platform cost. If you need many tiny shell operations, chain them with && in one Bash call.

Files in this gist

  • SKILL.md — the skill manifest and instructions (this is what Claude Code reads)
  • run-experiment.sh — runs a prompt N times and extracts per-run tool.execution durations
  • .ABOUT.md — this file

License

MIT — do whatever.

#!/usr/bin/env bash
# Profile a Claude Code tool call by running a prompt N times and extracting
# tool.execution durations from OTel console-exported spans.
#
# Usage: run-experiment.sh <runs> <prompt> <tool_name_substring>
# Example:
# run-experiment.sh 5 "Run echo hi via Bash, then say done" Bash
set -euo pipefail
runs="${1:?usage: $0 <runs> <prompt> <tool_name_substring>}"
prompt="${2:?usage: $0 <runs> <prompt> <tool_name_substring>}"
tool_match="${3:?usage: $0 <runs> <prompt> <tool_name_substring>}"
tmpdir="${TMPDIR:-./.tmp}"
mkdir -p "$tmpdir"
logs=()
for i in $(seq 1 "$runs"); do
log="$tmpdir/otel-run-$i.log"
logs+=("$log")
echo "=== run $i/$runs ==="
echo "$prompt" | \
CLAUDE_CODE_ENABLE_TELEMETRY=1 \
CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1 \
OTEL_TRACES_EXPORTER=console \
OTEL_LOG_TOOL_DETAILS=1 \
claude -p --permission-mode bypassPermissions \
> "$log" 2>&1
done
echo ""
echo "=== tool.execution durations (matching tool name substring: $tool_match) ==="
printf "%-5s %-12s\n" "run" "duration_ms"
printf "%-5s %-12s\n" "---" "-----------"
vals=()
for i in $(seq 1 "$runs"); do
log="$tmpdir/otel-run-$i.log"
# Extract duration_ms from the tool.execution span whose parent tool matches.
# Strategy: find each claude_code.tool block, keep tool_name + execution duration,
# emit the execution duration if the name matches.
dur=$(awk -v match="$tool_match" '
/^ name: "claude_code\.tool\.execution"/ { in_exec=1; exec_dur="" }
in_exec && /^ duration_ms:/ { gsub(/,/, "", $2); exec_dur=$2 }
in_exec && /^}$/ { last_exec=exec_dur; in_exec=0 }
/^ name: "claude_code\.tool"/ && !/\.blocked/ && !/\.execution/ { in_tool=1; tool_name="" }
in_tool && /^ tool_name:/ { sub(/^ tool_name: /, ""); sub(/,$/, ""); gsub(/"/, ""); tool_name=$0 }
in_tool && /^}$/ {
if (index(tool_name, match) > 0) { print last_exec; exit }
in_tool=0
}
' "$log")
printf "%-5s %-12s\n" "$i" "${dur:-n/a}"
[ -n "$dur" ] && vals+=("$dur")
done
if [ "${#vals[@]}" -gt 0 ]; then
sum=0; min=999999999; max=0
for v in "${vals[@]}"; do
sum=$((sum + v))
[ "$v" -lt "$min" ] && min=$v
[ "$v" -gt "$max" ] && max=$v
done
avg=$((sum / ${#vals[@]}))
echo ""
echo "=== stats ==="
echo "n=${#vals[@]} min=${min}ms max=${max}ms avg=${avg}ms"
fi
name claude-tool-profiling
description Measure how long Claude Code tool calls take using the built-in OpenTelemetry console exporter. Use when asked to profile a tool, benchmark a tool, measure tool duration, time a tool call, compare tool overhead, or investigate why a tool feels slow.

Claude Tool Profiling

Keep the gist in sync. This skill directory is a git clone of a public gist: https://gist.github.com/gurdiga/3ce759edce07d2ae60baa12d4b21edae

After editing, commit and push:

cd ~/.claude/skills/claude-tool-profiling
git add -A && git commit -m "Update" && git push

Time individual tool calls (Bash, Edit, MCP, etc.) via Claude Code's built-in OTel instrumentation. No Docker, no collector — console exporter dumps spans to stdout, then a small awk extractor pulls out tool.execution durations.

When to reach for this

  • User says "measure how long X takes" where X is a Claude tool
  • Comparing overhead between two tools (e.g. Bash vs an MCP tool)
  • Checking if a tool call's cost justifies a workflow choice (e.g. batching vs. many small calls)

Not for: profiling user scripts, profiling LLM latency (it's captured but isn't the focus), production observability (use a real OTel collector instead).

Quick start

Run the prompt 5 times, extract durations, print stats:

bash ~/.claude/skills/claude-tool-profiling/run-experiment.sh \
  5 \
  "Run echo hello via Bash, then say done" \
  Bash

Args: <runs> <prompt> <tool_name_substring>.

Output: per-run tool.execution ms, then min/max/avg across runs.

How it works

Claude Code emits OTel spans when these env vars are set:

CLAUDE_CODE_ENABLE_TELEMETRY=1
CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1   # gates per-tool spans
OTEL_TRACES_EXPORTER=console            # JSON spans → stdout
OTEL_LOG_TOOL_DETAILS=1                 # include tool_name, full_command, file_path

Each tool call produces a three-span tree:

claude_code.tool               tool_name=X   duration_ms=T          ← total
├── claude_code.tool.blocked_on_user         duration_ms=P          ← permission wait
└── claude_code.tool.execution               duration_ms=E, success  ← actual run

tool.execution is the number you want — permission wait and parent overhead are captured separately.

Running claude -p non-interactively

Pass the prompt on stdin, not as a positional arg — in some sandboxed shells the positional arg is dropped:

echo "<your prompt>" | \
  CLAUDE_CODE_ENABLE_TELEMETRY=1 \
  CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1 \
  OTEL_TRACES_EXPORTER=console \
  OTEL_LOG_TOOL_DETAILS=1 \
  claude -p --permission-mode bypassPermissions \
    > run.log 2>&1

Gotchas:

  • --permission-mode bypassPermissions removes the permission prompt path (tests the fast path)
  • Spans land in stdout, interleaved with the model's final text answer
  • MCP tools configured globally in ~/.claude.json are inherited by claude -p
  • Deferred tools (those needing ToolSearch) cost one extra LLM round-trip per fresh session — factor that into first-call numbers

Extracting durations

Extract all tool spans from a log:

awk '
  /^  name: "claude_code\.tool"/ { in_tool=1; tool=""; dur="" }
  in_tool && /^    tool_name:/ { sub(/^    tool_name: /,""); sub(/,$/,""); gsub(/"/,""); tool=$0 }
  in_tool && /^    duration_ms:/ { gsub(/,/,"",$2); dur=$2 }
  in_tool && /^}$/ {
    printf "  tool=%-48s  duration_ms=%s\n", tool, dur
    in_tool=0
  }
' run.log

For just tool.execution durations:

awk '/name: "claude_code\.tool\.execution"/,/^}$/' run.log \
  | awk '/duration_ms:/ {gsub(/,/,"",$2); print $2}'

Calibration numbers

Baseline measurements on Claude Code 2.1.123, macOS arm64 (April 2026):

Tool avg tool.execution n
Bash "echo hello" 3,410 ms 5
mcp__chrome-devtools__take_screenshot (viewport PNG) 190 ms 5

Bash has a ~3s floor per call from subprocess + hook + IPC overhead. MCP stdio tools don't pay this cost. Rule of thumb: for many tiny shell ops, chain with && in a single Bash call rather than making many calls.

Interpreting results

A full run decomposes into:

claude_code.interaction              ← wall clock
├── claude_code.llm_request × N      ← model thinking (usually dominates)
└── claude_code.tool × M             ← each tool call
    ├── .blocked_on_user              ← permission wait (~ms)
    └── .execution                    ← the number that matters

If .execution is fast but wall time is slow, the overhead is in the LLM, not the tool. Don't blame the tool for round-trip latency.

See also

Docs: code.claude.com/docs/en/monitoring-usage.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment