Does cutting tokens hurt output quality?

No - the savings come from removing noise, not signal. In practice cleaner prompts often improve consistency, because there are fewer contradictory instructions confusing the model. Keep the constraints that define format and scope; cut the filler.

Do I have to rewrite all my prompts from scratch?

No. Run each existing prompt through the cut list and the three audit prompts. Most people find a 40 to 60 percent reduction on their most-used prompt in under 15 minutes without starting over.

What is the single highest-leverage change?

Move all standing context - project description, tone rules, audience - into the system prompt (or your CLAUDE.md) so it is sent once per session, and keep user messages short. That one change routinely cuts input tokens by around 70 percent on repeated calls.

Where do the big savings come from in an automation?

Repetition. A 500-token waste in a prompt that fires 300 times a month is 150,000 wasted tokens every month. Automation prompts are where trimming pays off most, so pre-flight every one before it goes live.

Should I switch to a cheaper model to save money?

You do not have to. The point of this playbook is that you can keep using Sonnet and Opus and still cut costs dramatically, because the savings come from sending fewer wasted tokens rather than from a lower per-token price.

How do I keep costs from creeping back up?

Set a baseline token count for each prompt or automation, version your prompts with dates so you can roll back, and check usage weekly so creep does not sneak in over time.

Cost & tokens

Cut Your Claude Code Costs 70%

6 minute readUpdated June 2026Explore more

TL;DR

Most Claude bills are inflated by tokens that do nothing - filler instructions, repeated context, role-play preambles. Trim what you send, move standing context into the system prompt once, and the same work costs a fraction. No quality loss, no rewriting your prompts from scratch.

Why your bill is higher than it should be

You pay for every token you send - including the useless ones. When we look at how most people prompt Claude, 40 to 70 percent of the input is dead weight: long role-play preambles, business background repeated on every call, polite throat-clearing, and vague quality adjectives that mean nothing to the model. None of it changes the output. All of it shows up on the bill.

The fix is not switching to a cheaper model. You can keep running Sonnet and Opus. The fix is sending lean prompts and putting standing context where it belongs. Work through the cut list below on any prompt before it goes into production, then automate the cleanup with the three audit prompts.

The cut list: 9 things to remove from every prompt

1Long role-play preambles. 'You are a world-class expert with decades of experience' adds zero instruction. Cut it, or shorten to one clause: 'Act as a direct-response copywriter.'
2Company background paragraphs. If Claude needs context about your project, put it in the system prompt once - not in every user message.
3Politeness phrases. 'Please', 'thank you', 'I'd really appreciate it'. Claude does not respond to flattery; these tokens cost money and do nothing.
4Vague quality adjectives. 'Make it high-quality', 'be professional', 'write engagingly'. Replace with a specific constraint: 'Under 150 words. Short sentences. No jargon.'
5Redundant formatting instructions. 'Use bullet points' plus 'present as a list with dashes' is the same instruction paid for twice. Pick one.
6Negative instruction bloat. Swap a long 'do not do X, do not do Y, do not do Z' block for 'Plain language. Active voice. Direct.' Same output, a fraction of the tokens.
7Repeated context in multi-turn chats. If you established the task in message one, do not re-explain it in message three. Claude holds context.
8Example bloat. Three examples do roughly what one sharp example does, at triple the cost. Use one, and ask Claude to generate variations from it.
9Output length buried in prose. 'A response approximately two to three paragraphs in length' is wasteful. 'Output: 2-3 paragraphs' does the same job in a third of the tokens.

Stop repeating context: use the system prompt right

This is the single highest-leverage change most people can make. They re-send the same context block - project description, tone rules, audience - in every message. The correct architecture is to put all standing context in the system prompt, send it once per session, and keep user messages short: just the task plus session-specific variables.

Restructured this way, a typical research agent drops from roughly 4,200 input tokens per call to around 1,100 for identical output - about 74 percent fewer input tokens. Run that agent dozens of times a week and the saving compounds fast.

Three copy-paste audit prompts

Paste the relevant one before any prompt you want to clean up. The first finds waste, the second compresses, the third pre-flights anything you are about to put into an automation.

Prompt 1 - find the fatAnalyze the following prompt for token waste. Identify: (1) filler
phrases that add no instruction value, (2) redundant context that
repeats the same idea twice, (3) politeness language Claude does not
need, (4) role-play preambles longer than one sentence, (5) vague
adjectives like 'high-quality' with no measurable definition. List
each issue, then output a trimmed version that removes all waste and
preserves full instruction clarity.

[PASTE YOUR PROMPT HERE]

Prompt 2 - compressCompress this prompt without losing instruction clarity. Rules:
remove all filler and politeness; convert prose instructions to
shorthand (e.g. 'Output: bullet list, 5 items max'); eliminate any
instruction that repeats an idea already stated; do not change the
task, tone guidance, or output format. Show the original and the
compressed version side by side with a percentage estimate of tokens
saved.

[PASTE YOUR PROMPT HERE]

Prompt 3 - automation pre-flightI am about to deploy this prompt inside an automation that runs
repeatedly. Review it for: (1) context that should move to a system
prompt instead of repeating per call, (2) missing output-length
constraints that could cause runaway responses, (3) static text that
should be a variable, (4) redundant instructions that could collapse
into one line. Flag every issue with a one-line fix, then give me the
deployment-ready version.

[PASTE YOUR AUTOMATION PROMPT HERE]

Three mistakes people make when trimming

Cutting load-bearing instructions. Do not cut constraints that define output format or scope - those carry weight. Cut filler, not structure.
Removing every example. One example beats zero for generation tasks; zero is fine for extraction. Know which task you have before you cut.
Compressing without testing at scale. A trimmed prompt can work on simple inputs and break on edge cases. Test across at least 10 varied inputs before full deployment.

Common questions

Does cutting tokens hurt output quality?
No - the savings come from removing noise, not signal. In practice cleaner prompts often improve consistency, because there are fewer contradictory instructions confusing the model. Keep the constraints that define format and scope; cut the filler.
Do I have to rewrite all my prompts from scratch?
No. Run each existing prompt through the cut list and the three audit prompts. Most people find a 40 to 60 percent reduction on their most-used prompt in under 15 minutes without starting over.
What is the single highest-leverage change?
Move all standing context - project description, tone rules, audience - into the system prompt (or your CLAUDE.md) so it is sent once per session, and keep user messages short. That one change routinely cuts input tokens by around 70 percent on repeated calls.
Where do the big savings come from in an automation?
Repetition. A 500-token waste in a prompt that fires 300 times a month is 150,000 wasted tokens every month. Automation prompts are where trimming pays off most, so pre-flight every one before it goes live.
Should I switch to a cheaper model to save money?
You do not have to. The point of this playbook is that you can keep using Sonnet and Opus and still cut costs dramatically, because the savings come from sending fewer wasted tokens rather than from a lower per-token price.
How do I keep costs from creeping back up?
Set a baseline token count for each prompt or automation, version your prompts with dates so you can roll back, and check usage weekly so creep does not sneak in over time.

Want the rest of the cost-cutting playbook?

Get the other 3 in the cost-cutting stack - free, with 5,000+ builders.

Join the Club