TL;DR
Most people fly blind on Claude Code spend - they start a project, run it, and check the dashboard once a week. By then the spike has happened. The good news: nearly every surprise cost comes from a handful of fixable levers - inherited context files, defaulting to the most expensive model, caching left off, large files read every session, and no usage limit. Walk this checklist once and you control your spend instead of discovering it.
Why a cost audit is worth 30 minutes
Claude Code does not loudly warn you when your token usage climbs. It assumes you know what every file and setting in your environment does - and most people do not. A config you never touched, a context file from an old project, or the wrong default model can quietly double your spend with nothing extra to show for it. The fix is to audit the levers once, before you scale a workflow, not after the bill arrives.
Lever 1: stale CLAUDE.md files inheriting context
CLAUDE.md files load automatically and carry instructions into every session. A file left in a parent directory or an old project can quietly request extended thinking, large context, or blanket file reads - all of which multiply tokens. Search your system for every CLAUDE.md, review each one, and remove or trim any you did not deliberately write.
bashfind ~ -name 'CLAUDE.md' -not -path '*/node_modules/*' 2>/dev/nullLever 2: defaulting to the most expensive model
Opus is the most capable model and the most expensive per token - many times the cost of Sonnet or Haiku. If your setup runs Opus for simple work like file reads, reformatting, and basic edits, that is pure waste. Use Sonnet as your everyday default, reserve Opus for genuinely hard reasoning, and let Haiku handle the lightweight tasks. See our model-routing guide for a concrete rule.
Lever 3: prompt caching left off
Anthropic offers prompt caching that can cut the cost of repeated context dramatically. If you send the same system prompt, instructions, or project context across many sessions and caching is not enabled, you pay full price every single time. Turning it on is one of the highest-leverage changes you can make on repeat-heavy workflows.
Lever 4: large files read on every session start
If your CLAUDE.md or project config tells Claude to read large files at session start - a full codebase, long docs, an entire test suite - those tokens bill every time you open a session. Across twenty sessions a week that adds up fast. Remove blanket file-read instructions and reference specific files only in the prompts that actually need them.
Lever 5: tool-call loops in agentic work
Every tool call - a file read, a bash command, a search - is a separate API round trip, and each one bills. An agent stuck retrying an error can fire dozens of calls before it gives up. Add loop-detection and a sensible maximum-tool-call limit to your agent instructions so a stuck task fails fast instead of grinding through your budget.
Lever 6: long sessions instead of fresh ones
The longer a conversation runs, the more context rides along on every turn - so each new message in a bloated session costs more than the same message in a fresh one. Start new sessions more often, and use /compact deliberately rather than letting a thread grow indefinitely. A clean session is almost always cheaper and usually sharper.
Lever 7: no usage limits or alerts
Anthropic does not babysit your spend for you. In the Console you can set usage limits and alerts - do it. A runaway loop or a bad overnight run should hit a ceiling, not your card. Set an alert partway through your monthly budget and a hard limit near the top. It takes a few minutes and is the single best insurance against a nasty surprise.
The master audit prompt
Paste this into Claude Code at the start of a new project to run a quick self-audit and flag anything that needs attention.
You are auditing my Claude Code environment for hidden cost risks. Check the following and report back with a pass or fail for each item, plus a one-line fix for anything that fails.
1. Are there CLAUDE.md files in parent directories that may be inheriting context I did not intend?
2. What model is set as my default - Opus, Sonnet, or Haiku? Is it heavier than my typical task needs?
3. Are there any file-read instructions in my project config that run automatically at session start?
4. Is prompt caching being used for my repeated system context?
5. Do my agent instructions have a maximum tool-call limit to prevent loops?
6. Am I running long sessions where a fresh session would be cheaper?
List each item, the current status, and whether action is needed. Be direct. No filler.Common questions
How much can a cost audit actually save?
It varies, but the big levers - dropping the default model from Opus to Sonnet for everyday work, turning on prompt caching, and removing automatic large-file reads - routinely cut repeat-workflow costs by half or more. The exact number depends on how heavy your current setup is.
Where do I set usage limits?
In the Anthropic Console under your billing and usage settings. Set an alert partway through your monthly budget and a hard limit near the top so a runaway process cannot run up an unexpected bill.
Why does an old CLAUDE.md file matter if I am not in that project?
CLAUDE.md files load based on directory, and one sitting in a parent folder above your project gets inherited automatically. If it asks for extended thinking or blanket file reads, every session pays for that whether you meant it or not. That is why the find command above is worth running.
What is prompt caching?
It lets Anthropic reuse large, unchanging chunks of your prompt - system instructions, project context - at a steep discount instead of charging full price each call. On workflows that send the same context repeatedly, enabling it is one of the cheapest wins available.
Does starting fresh sessions really save money?
Yes. Every turn in a conversation carries the whole context so far, so a long thread costs more per message than a fresh one. Starting new sessions and using /compact deliberately keeps each turn lean - and the model tends to stay sharper too.
How often should I run this audit?
Do the full pass once, then a quick re-check before you scale any new workflow or add agents. The levers do not change often, so once your environment is clean it mostly stays clean - you are just guarding against new config creeping back in.
Want to keep your AI spend under control as you scale?
Get the other 3 in the cost-cutting stack - free, with 5,000+ builders.
Join the Club