Why Claude Code Is Eating My Tokens
The Problem
In recent weeks Claude Code has been consuming an absurd amount of tokens, even for simple tasks.
The surge started after the new agents system was introduced, but it persists even when agents are disabled, likely they updated their system prompts as well.
One night it spend 75% of my 5 hour limit of my Claude Code Pro subscription in a single prompt in less than 15 minutes, I knew had to find a solution.
Alternatives I Explored
I wanted a command‑line‑only solution that could edit documents and invoke external programs, without being tied to a proprietary IDE. Below is the shortlist of tools I considered.
| Tool | Provider | Free‑Tier Details | Quick Take |
|---|---|---|---|
| Gemini | Limits are generous but the quota policy is opaque. | My go‑to for quick Q&A, reliable when Claude Code loops. | |
| Vibe | Mistal | Transparent monthly token allowance. | Attractive UI, but the model’s capabilities fell short of my needs. |
| Kiro | Amazon | 50 credits/month; complex prompts = 1 credit, simple prompts = 0.5 credit. | Fast Anthopic models, but the CLI is very basic. No Plan mode. |
| Codex | OpenAI | No free tier. | Not tried yet, next on my list. |
A more exhaustive comparison of coding agents is available at Artificial Analysis.
If you know of another solid CLI‑based coding assistant, please drop a note!
I also investigated OpenCode for a fully local setup. Current consumer‑grade models still struggle with tool‑calling and coding performance. I’ll revisit this in about three months, since improvements are going fast.
What “Free” Really Means
Most providers likely offer free tokens to collect usage data and gain market share.
The backend handling of prompts and context is not disclosed, even though the client tools themselves are open‑source.
What exactly is done with this information is hard to know for sure.
The biggest differences between the tools will be the handling of context. Understanding each service’s context‑management strategy and being bale to compare them will be the next step in my evaluation.
Using Claude More Efficiently
Anthropic’s documentation emphasizes that context is the primary cost driver—hence the introduction of parallel agents. A moderate‑sized plan can spin up three agents, each consuming roughly 60 k tokens. Consequently, the 5‑hour limit can be exhausted with fewer than two feature requests.
- Disabling agents in my global config (
~/.claude/CLAUDE.md) reduces overhead, but not enough. - Switching to the Haiku model yields about a 25 % token reduction, still insufficient for sustained work.
Current Solution: Diversify & Optimize
My short‑term strategy is to spread the workload across multiple tools while sharpening my Claude‑Code usage:
| Task Type | Preferred Tool | Rationale |
|---|---|---|
| Complex coding problems | Claude Code (agents disabled) | Best at deep reasoning, but token‑heavy. |
| Everyday queries & debugging loops | Gemini | Fast, cheap, and handles simple prompts well. |
| Minor code tweaks | Kiro | Low‑cost credits for straightforward edits. |
I added a global “no‑agents” instruction to my global CLAUDE file (~/.claude/CLAUDE.md) and began delegating tasks explicitly to the appropriate tool.
I also build a tool that gets the usage of Claude Code and adds the time limits and can send notifications. More on this in my next post.