Vibe Coding  

Intercepting and Decoding Claude Code API Calls Using MITM Proxy

Your team is evaluating AI coding tools — Claude Code, Copilot, Cursor — for production use. Before you adopt any of them, one question matters more than the demo: what exactly does this tool send to the provider on every request? Compliance teams will ask. Cost analysis depends on it. And honestly, if you don't know, you can't troubleshoot it.

In this article, we will intercept Claude Code's live API calls using a man-in-the-middle proxy, decode the full 80 KB payload sent to Anthropic, and walk through the agent loop that drives the tool. Before starting, I assume you have basic familiarity with HTTP traffic inspection and that you have Claude Code installed locally.

What is Claude Code?

Claude Code is Anthropic's official command-line agent for software engineering tasks. You give it a natural-language instruction, and it reads files, runs shell commands, edits code, and verifies its own work — all by calling tools defined in its system prompt. Underneath, it is an HTTPS client talking to the Anthropic Messages API.

What is mitmproxy?

mitmproxy is an open-source, scriptable proxy that decrypts HTTPS traffic by acting as a trusted intermediary. It ships with three frontends — mitmproxy (terminal), mitmweb (browser-based UI), and mitmdump (non-interactive). For this walkthrough we will use mitmweb.

Prerequisites

  • Claude Code installed and authenticated (setup guide)

  • Python 3.10 or higher (for installing mitmproxy)

  • A code editor such as Visual Studio or VS Code to inspect JSON

Step 1: Install and Start the Proxy

First, install mitmproxy using pip.

pip install mitmproxy

Now, start the web-based proxy on port 8080.

mitmweb --listen-port 8080

Open http://localhost:8081 in your browser. This is the inspector UI where intercepted traffic will appear.

Note: On first run, mitmproxy generates a self-signed certificate at ~/.mitmproxy/mitmproxy-ca-cert.pem. We will use this in the next step.

Step 2: Route Claude Code Through the Proxy

Open a second terminal. We need to do three things — route HTTPS traffic through the proxy, tell Node to trust the self-signed certificate, and start a fresh Claude Code session.

export HTTPS_PROXY=http://localhost:8080
export NODE_EXTRA_CA_CERTS=~/.mitmproxy/mitmproxy-ca-cert.pem
claude

Note: The NODE_EXTRA_CA_CERTS line is the part most people miss. Without it, Node's TLS layer rejects the proxy's self-signed certificate and Claude Code fails silently — no error, just no traffic. If your proxy stays empty after sending a prompt, this is almost always why.

Step 3: Intercept Your First Request

Inside the Claude session, send a small instruction to trigger an API call.

> read package.json and tell me which test framework is used

Switch to the mitmweb inspector at http://localhost:8081. You will see one or more entries appear. Click into the POST /v1/messages request and look at the size.

80 KB. For one sentence.

This is the moment that motivates the rest of this tutorial. A simple prompt shipped 80 kilobytes of structured JSON to Anthropic. Now we decode what is actually inside.

Step 4: Decode the 80 KB Payload

Reading raw JSON in the inspector is brutal — the structure is dense and most of it is repeated boilerplate. To make the payload readable, I built a small parser specifically for Claude Code requests.

Open the parser in your browser, copy the request body from mitmweb, and paste it in.

https://jitangupta.github.io/tools/claude-code-request-viewer.html

The output groups the payload into four sections. Each one teaches a different lesson about how Claude Code works.

1. System Prompt. The model identity block — "You are Claude Code, Anthropic's official CLI tool..." — sent on every request. It is deterministic, so Anthropic caches it server-side for 5 minutes.

2. Tool Definitions. Full JSON schemas for every tool: Bash, Edit, Grep, Glob, Read, Write, Agent, and others. Each schema defines parameters, types, and — importantly — when the tool should and should not be used. This is the bulk of the static payload, and it is also cached.

3. CLAUDE.md (injected as a system reminder). This one surprises most engineers. Whatever you write in CLAUDE.md at the root of your project gets injected, word for word, into every API request. If your CLAUDE.md is bloated, every request carries that weight.

4. Conversation History. The messages array — your prompts, Claude's responses, every tool call, every tool result. This is the part that grows over time.

Note: For enterprise compliance reviews, the CLAUDE.md injection is the detail to flag. Anything in that file leaves your network on every request. Treat it as outbound data.

Step 5: Watch the Agent Loop Grow

Here is the mental model that ties everything together. Claude Code is a while loop.

while not done:
    payload = assemble(system_prompt, tools, claude_md, history)
    response = api_call(payload)
    if response.contains_tool_call:
        result = execute_tool_locally(response.tool_call)
        history.append(response)
        history.append(result)
    else:
        done = True

There is no separate planner. The loop is the planner. Every iteration, the model decides whether it has enough information to answer, or whether it needs another tool call. To see this in action, give Claude a multi-step task.

> run the tests and fix any failing test

Switch back to the mitmweb inspector and watch the request sizes as Claude works.

Request 1:  80 KB   (initial prompt)
Request 2:  95 KB   (after Bash ran tests)
Request 3: 110 KB   (after Read on failing file)
Request 4: 130 KB   (after Edit applied fix)

Every iteration appends the previous tool call and its result to messages. The payload grows because the conversation history is cumulative, not because the system prompt or tool definitions change.

Step 6: Understand the Cost Economics

Now the natural question — if the payload grows on every request, does the cost grow proportionally?

No. And the reason is prompt caching. To verify this, I ran cclogviewer on a session log to see the actual token breakdown.

Request 1: 8,023 tokens cache write + ~2,000 tokens fresh input
Request 2:   ~300 tokens fresh      + 12,203 tokens cache read
Request 3:   ~250 tokens fresh      + 12,500 tokens cache read

Two numbers matter for cost analysis.

  • Cache write is approximately 25% more expensive than fresh input — but it only happens once.

  • Cache read is approximately 10 times cheaper than fresh input.

Even though every request resends the full system prompt, tool definitions, and CLAUDE.md, those bytes are billed at a fraction of the normal rate after the first request. Only the new content — your latest message and the freshly-produced tool output — is charged at full price.

This is why a 10-tool-call session does not cost 10 times the first request. With caching, it scales close to linearly with new content. Without caching, it would scale roughly quadratically.

Step 7: Replay Sessions From Local JSONL Logs

Claude Code records every session locally as a JSONL file. Navigate to the project log directory.

~/.claude/projects/<project-folder>/<session-id>.jsonl

Each line is one request or response. Raw, the file is impractical to read. The community has solved this with two tools.

  • claude-replay — drag a JSONL file in, get a visual playback of the conversation including every tool call and result.

  • cclogviewer — terminal tool with a per-request token cost breakdown.

For enterprise teams adopting Claude Code, these JSONL logs are gold for two reasons. First, audit and compliance — you can see exactly what left the machine. Second, debugging — when a session goes off the rails, replay shows you the exact iteration where Claude made the wrong call.

See the Full Demo

The proxy walkthrough, the parser output, the agent loop, and the cost breakdown are easier to follow visually than in prose. The video runs through every step end to end.

Summary

We intercepted live Claude Code API calls using mitmproxy, decoded the 80 KB payload into its four sections (system prompt, tool definitions, CLAUDE.md, and conversation history), watched the agent loop grow request payloads across a multi-step task, and analyzed the token economics that keep the cost in check thanks to prompt caching.

For engineering teams evaluating Claude Code for production use, three takeaways matter most. First, your CLAUDE.md ships on every request — treat it as outbound data and keep it lean. Second, the agent is a while loop, not a planner — payload growth is expected behavior, not a bug. Third, prompt caching is what makes this architecture economically viable at scale.

Try the interception on your own setup, paste a real request into the parser, and see what your team's actual usage looks like. If you are evaluating AI coding tools for an enterprise context, this is the level of visibility you should have on every tool you adopt.