Back in January, our monthly AI bill hit $47,000. By April it was under $6,000 — and we're processing more traffic than before. No magic, no switching to worse models. Just smarter routing.
I'm going to walk through exactly what we changed, because I think a lot of teams are in the same spot we were: defaulting to the most expensive model for everything because it's the easiest thing to set up.
Where the money was going
We ran an audit on our API logs and the picture was pretty ugly. About 60% of our requests were simple stuff — classifying support tickets, extracting names from emails, generating short summaries. Tasks that a model like GPT-4o-mini or even Llama 3.1 handles perfectly well for a fraction of the price.
But because someone on the team had set up the integration with Claude Sonnet six months ago and nobody revisited it, we were paying premium rates for every single call. That's not a knock on Claude — it's great for complex reasoning. It's just wildly overkill for "is this email about billing or technical support?"
The routing approach that actually worked
We built a lightweight layer that sits between our application and the LLM providers. Before each API call, it looks at a few things:
- What kind of task is this? A classification prompt needs a different model than a code generation prompt or a nuanced legal summary.
- How complex is the input? A three-word user query doesn't warrant a 128K context window model.
- What's the latency requirement? Customer-facing chat needs sub-second responses. Batch processing doesn't.
- Are there regional constraints? European user data sometimes needs to stay with EU-based providers — more on that in a separate post.
Based on those signals, the router picks the best model from a pool of 50+. Sometimes that's GPT-4o. Sometimes it's DeepSeek. Sometimes it's a free-tier model that handles the task perfectly.
The actual numbers
Here's what our spend looks like now, broken down by task type. These are real numbers from our April traffic:
| Task Type | Tokens/Month | Old Cost | New Cost |
|---|---|---|---|
| Ticket classification | 180M | $10,800 | $27 |
| Email summarization | 95M | $5,700 | $142 |
| Code review assistance | 210M | $12,600 | $3,150 |
| Complex analysis | 45M | $2,700 | $1,890 |
| Customer chat (real-time) | 120M | $7,200 | $420 |
The biggest wins came from the high-volume, low-complexity tasks. Classification and summarization alone went from $16,500/month to $169. That's not a typo.
What about quality?
This was the team's biggest concern, and honestly mine too. We ran parallel testing for two weeks — every request went to both the original model and the router-selected model. Then we had humans blind-evaluate the outputs.
For simple tasks (classification, extraction, short summaries), there was zero measurable difference. For complex tasks, the router actually performed slightly better because it was picking models specifically suited to each task type rather than using a one-size-fits-all approach.
The only place we saw a dip was creative writing — and even then, it was marginal. We pinned that task type to Claude and moved on.
Implementation was surprisingly simple
The whole thing is OpenAI-compatible, so existing code barely changed. We swapped out the base URL and API key:
// Before
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// After — same SDK, different endpoint
const client = new OpenAI({
apiKey: process.env.WORKCHI_API_KEY,
baseURL: "https://api.workchi.ai/v1",
});That's it. No framework changes, no new libraries, no rewrites. LangChain, LlamaIndex, Vercel AI SDK — they all work unchanged because they all speak the OpenAI protocol.
Three things I'd do differently if starting over
1. Start with an audit, not a solution. We spent two weeks evaluating routers before we even looked at our own logs. Should have audited first — we'd have realized the problem was worse than we thought and moved faster.
2. Don't try to route everything on day one. We started with batch processing (where latency doesn't matter) and gradually moved to customer-facing traffic. Much less stressful than flipping everything at once.
3. Track per-task-type costs from the start. We didn't have this instrumentation initially, which made it hard to prove the ROI to leadership. Set up cost tracking before you start routing.
Wrapping up
I'm not going to pretend routing is some revolutionary insight. The idea of "use cheaper models when you can" is obvious. But actually implementing it — instrumenting your traffic, testing quality thresholds, handling fallbacks — that's the part most teams skip because it sounds like work.
It took us about three weeks end-to-end. The savings paid for the engineering time in the first four days. If your AI bill is over $5K/month and you're using a single model for everything, it's worth a serious look.