Cost & tokens

Route Cheap, Run All Day

6 minute readUpdated June 2026Explore more

TL;DR

Most tasks do not need your most powerful model - they need something good enough, fast, and cheap. Route the heavy work to Sonnet or Opus and let Haiku handle file reads, summaries, reformatting, and status checks. You stop brushing against limits because most calls barely cost anything, and your bill drops without any loss of quality on the work that matters.

Why you keep hitting the ceiling

Heavy Claude Code users hit limits and watch costs climb for the same reason: every task gets sent to the most powerful model, whether it needs it or not. The obvious fix - just pay for more - scales your cost linearly with usage, so the more you automate the more you pay. The smarter fix is to route. Send the hard stuff to Sonnet or Opus and everything else to Haiku, which is fast, cheap, and handles the bulk of routine work without complaint.

Step 1: confirm your cheap model works

You need an Anthropic API key from the Console. Once you have one, this quick test confirms Haiku is reachable before you build anything on top of it.

bashcurl https://api.anthropic.com/v1/messages \
  -H "x-api-key: YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-haiku-4-5-20251001",
    "max_tokens": 100,
    "messages": [{"role": "user", "content": "Reply with: Haiku is live."}]
  }'

Step 2: write the routing rule

This is the part people skip - and then wonder why nothing changed. Without a rule, every call still goes to your default model. The logic is simple: reasoning about architecture, debugging a hard error, or writing logic you have not defined yet goes to Sonnet or Opus; reading a file, writing a comment, reformatting, summarising a log, or checking syntax goes to Haiku. Have Claude generate the function for you.

I am building a Claude Code setup with a cheap fallback model (Haiku) to keep costs down and avoid brushing against limits. Write me a Python function called route_model(task_description) that takes a plain-English description of a task and returns either "claude-sonnet-4-6" or "claude-haiku-4-5-20251001". Use these rules: route to Haiku for file reads, summarization, reformatting, syntax checks, boilerplate, and status messages. Route to Sonnet for architecture decisions, complex debugging, multi-step reasoning, and anything involving business logic I have not defined yet. Include a catch-all that defaults to Haiku to minimize cost. Add a comment explaining each routing decision.

Step 3: cap your spend and track it

Before any agent runs in production, set a hard monthly usage limit in the Anthropic Console - a runaway loop should hit a ceiling, not your card. Set it at roughly three times what you expect to spend so you have room to grow but a worst case that cannot hurt. Then add a lightweight tracker so you can see where your money actually goes.

Write me a Python function called track_api_cost(model, input_tokens, output_tokens) that calculates the estimated cost of an Anthropic API call based on current public pricing for claude-haiku-4-5-20251001 and claude-sonnet-4-6. Store a running total in a local JSON file called api_costs.json with keys: date, model, input_tokens, output_tokens, estimated_cost_usd, and cumulative_monthly_cost_usd. If the cumulative monthly cost exceeds a MONTHLY_CAP variable I can set at the top of the file, print a warning and return False to halt the call. Otherwise return True. Keep the code under 60 lines.

Three mistakes that break this

  • No routing rule, so everything still goes to your big model - you hit limits just as fast and costs stay high. Build the rule in step 2 before running anything.
  • No spend cap set, so a bad loop can run up a large bill overnight. Set the cap before your first real run, not after.
  • Routing genuine reasoning to Haiku - it will answer hard tasks, just often with the wrong answer delivered confidently. Be honest about what needs real reasoning and send those to Sonnet or Opus every time.

Tune it after a week

Run your normal workflow for seven days, then look at where the spend landed. If Haiku keeps handling tasks that need a Sonnet follow-up, promote those. If Sonnet is doing work Haiku could handle, demote those. One round of tuning typically trims another 20 to 30 percent off an already-reduced cost - and from there the bottleneck stops being cost and starts being what you decide to automate next.

Common questions

  • Which model should be my default?

    Sonnet for everyday work, Haiku for lightweight tasks (file reads, summaries, reformatting, status checks), and Opus only for genuinely hard reasoning. The routing rule in step 2 encodes exactly this split so you do not have to decide call by call.

  • Will routing to Haiku hurt my output quality?

    Not on the tasks it is suited for - Haiku handles the bulk of routine work cleanly. Quality only drops if you push real reasoning onto it. Keep complex debugging and architecture on Sonnet or Opus and you get the savings with no downside.

  • Do I need to be a developer to set this up?

    The routing and tracking pieces involve a little Python, but Claude writes both functions for you from the prompts above - you describe what you want and paste the result in. If you can run a command in your terminal, you can run this.

  • How is this different from the cost audit?

    The cost audit finds settings and habits quietly inflating your bill across the board. This guide is one specific, high-impact tactic from that world: matching each task to the cheapest model that can do it well. Use the audit to clean house, then routing to stay lean.

  • What are the current model names to use?

    Use the latest model identifiers from Anthropic's documentation - the prompts here use Haiku 4.5 and Sonnet 4.6 as examples. Model names update over time, so check the Console or docs and swap in the current identifiers if newer ones have shipped.

  • Where do I set the spending limit?

    In the Anthropic Console under billing and usage. Set a hard monthly limit at roughly three times your expected spend, plus an alert partway through, so you have room to scale but a firm ceiling against runaway costs.

Want a setup that runs all day without breaking the bank?

Get the other 3 in the cost-cutting stack - free, with 5,000+ builders.

Join the Club