Perspective · July 2026 · 6 min read

Routing, Not Rationing

There is a number buried in this season’s most-discussed cost memo that matters more than the headline it sat under, and it’s the least glamorous one on the page: a cache hit rate that went from 5% to 60%. It tells you where the cost of AI actually comes from — not the model’s sticker price, but the layer that routes and reuses the work. This is why model choice has become an operating discipline, and why the routing layer, not the model, is the thing worth owning.

Routing, Not Rationing

The most important AI cost number this season wasn't a model price. It was a cache hit rate that climbed from 5% to 60%.

There's a number buried in the most talked-about AI cost memo of the season, and it's the least glamorous one on the page. Not a model price. Not a benchmark score. Nothing anyone screenshots. It's a cache hit rate, the share of requests answered from work already done, that went from five percent to sixty percent once the requests were made cache-aware. It sat several paragraphs below the headline, and I will tell you right now: if you are sizing an AI budget, that is the part to read twice.

This number matters because it puts the wrong argument to rest. We have spent two years debating which model is cheapest, as if the bill were a function of sticker prices on a rate card. That number says otherwise. Most of what an AI workload costs is decided not by the price of the model you picked, but by the layer around it. The part that decides which model each task goes to and whether the same work gets paid for twice. The cost of AI looks like a pricing challenge, but it is actually an operations challenge.

What the discipline actually looks like

Once you see cost this way, the playbook stops being about finding the cheap model. It becomes something closer to running a kitchen well. There are three moves, and none of them are about the model itself.

First, send each task to the model that fits it. Stop running everything through the most capable model you have. A frontier model is the right tool for a genuinely hard reasoning step, but it is the wrong tool for the routine majority of work that surrounds it. Second, make the inexpensive capable path the default so usage flows there naturally without anyone being told to ration. Third, reuse relentlessly, because the cheapest request is the one you already answered. It really is this simple. Defaults, routing, reuse. Your savings live in the orchestration layer, not the model.

This is the real shift in mindset. Model choice has become an operating discipline, the same way cloud cost management or inventory routing did before it. It is something you run continuously and well, not a procurement decision you make once and forget. And like those disciplines, the advantage compounds for whoever owns the layer that does the routing, because models keep changing underneath while the discipline of routing them does not.

The receipts, for anyone who wants them

I am describing this as settled because, in the last two weeks, two companies demonstrated both halves of it in public.

One published the playbook. The CEO of a public company wrote up how it cut AI spending nearly in half while usage kept growing, and framed it not as a secret weapon but as infrastructure anyone could copy. Default to open-weight models. Route each task to the one that fits. Cache aggressively. Make usage visible instead of capped (Brian Armstrong on X, June 26). That 5% to 60% cache figure is his. So is the detail I find most telling: 91% of their engineers never hit a usage limit, which is why they moved the default instead of lowering the ceiling. You don’t ration the thing you want more of.

The other showed why the models you route across should be ones you can hold rather than only rent. The same week, a leading lab released its newest model and, at the government's request, limited it to roughly twenty approved partners. The lab itself said that kind of access process should not become the long-term default (CNBC, June 26). I don’t view that as a failure on their part. They are careful engineers complying with a real constraint, and I take them at their word about not wanting it to be permanent. I view it as a clean illustration of what single-vendor dependence means. The best closed tool can become unavailable to you, on a timeline you do not set, for reasons that have nothing to do with your business. A routing layer built on open weights you can serve yourself does not have that failure mode.

Combine the two, and the argument makes itself. The saving comes from the routing. The durability comes from routing across models you control.

Why a frontier lead makes the case stronger, not weaker

The part it would be dishonest to skip

Closed frontier models still hold a real edge on the hardest tasks. I am talking about graduate-level science, frontier mathematics, and genuinely novel planning steps. The best open models trail by about three to four months according to Epoch AI. That gap is narrow and shrinking, but it is real.

Consider how that actually shifts the debate. If a frontier model earns its place on the hardest ten percent of your work, and open weights win the routine ninety percent at a fraction of the cost, then the right answer is neither always open nor always closed. The right answer is to route. Send the hard reasoning to the model that reasons best. Send the high volume execution to the model that does it well for almost nothing. The existence of a frontier lead is not a reason to skip routing. It is the strongest reason to do it.

The layer worth owning

This is the work we do at BasedAI. Depending on who you are, there are two ways to access our routing layers.

If you have engineers, BasedAPIs is the routing layer as infrastructure. It provides open-weight models served for production behind a single API, with the defaults, routing, and reuse built in. It is the difference between spending a quarter assembling the thing the playbook describes and spending an afternoon calling it. 

If you do not have engineers, Hirebase is the same discipline productized for teams that will never have a platform organization at all. You hire an AI coworker for a role, and it routes each task to the right open-weight model on its own. A small business gets the cost structure of a company with its own gateway and never sees the machinery. Same discipline, two front doors. The API for the people who build, and the product for the people who just need the work done.

The model you run on will keep changing. New open weights emerge almost weekly, and the frontier will hold its narrow lead on the hardest problems a while longer. The thing that keeps its value through all of it is the layer that routes the work well across whatever is best. It runs on infrastructure you control rather than terms you do not. That is the asset we’re building.

BasedAI is the acceleration and commercialization layer for open source AI. We build products that turn the best of open source into reliable, useful work for real people and real businesses.

If you are deciding whether to build that layer or run on one already built, I’d be happy to compare notes.