Perspective · May 2026 · 8 min read

Where Does Our Data Go?

When considering integrating AI capabilities, there is a polite question that gets asked in procurement meetings, and the real question that gets asked behind closed doors, usually after someone reads a contract more carefully than they meant to.

The polite question: Where does our data reside?

The real question: If my company sends a year’s worth of customer conversations to a frontier AI lab, and that lab trains its next model on what we sent, what stops the model from helping the next person who tries to compete with us?

This is the question the AI industry has been very good at avoiding for three years. We think the real question is the one that should decide what kind of relationship businesses have with AI for the next ten. We believe an honest answer is what makes open source AI a safer and more ethical position, not just a technical one.

Opportunity vs. Opportunity Cost

When a closed-model API is offered to a business, the offer comes with a set of defaults. Some of those defaults are visible. Most of them are not. Your business sends  prompts with your company’s proprietary information and potentially your customers’ data. The model gives you a response. In exchange, depending on the tier and the contract, that data may be retained for a period of time, reviewed by humans for safety, used to improve the next version of the model, used to build classifiers, and may be used to train smaller models the vendor sells separately. The list of mays is long. The list of will nots is short and lives in a contract addendum somebody negotiated last summer.

For most businesses this was acceptable, because the model the vendor was building was treated as a tool. Tools get sharper with use. Sharper tools serve everyone better. That framing was acceptable when the model felt like infrastructure.

However, the model has stopped feeling like infrastructure. The model is the product. And the product gets better, in part, based on your businesses interactions including the businesses that compete with each other.

This is the structural tenant  that makes the question an ethics question, not a compliance question.

Who gets smarter

Imagine two companies in the same market. Both onboard the same closed-model API. Both use it for customer-facing work, which means both are sending real customer conversations and data into the lab’s infrastructure.

The lab’s next model gets better because it has seen the real-world data from both. That model is then offered back to both companies, at the same price, with the same capability.

Now imagine a third company, the lab itself, decides to enter the same market. The third company has access to the latent learnings of all their potential  competitors within a model with the institutional memory gleaned from its previous users. Your company. 

Nothing in this scenario is illegal. Today,  it isn’t even considered obviously wrong by current industry norms. But it is also, plainly, not the relationship most companies thought they were entering into when their legal team signed the API contract.

What has changed in the last eighteen months is that this scenario is no longer hypothetical. The frontier labs are building products. They are entering markets. They are competing, sometimes directly, sometimes through acquisition, with the businesses that have been their best customers. The emerging competition has the data because they own the model. The emerging competition has the institutional memory of every prompt that was sent to it.

Most coverage of “enterprise AI strategy” does not discuss the real question out loud.

What was actually traded

The trade was framed as data for capability. Send us your data, get the output - efficiencies, better results, increased abilities and skills..

A more honest framing is trading your data for capability today, which the model owner leverages for their own capability tomorrow.

Each year of customer interactions a business sends to a closed API is a year of compounding advantage for the lab that hosts it. Yes, over time, the business might get a slightly better model. But for certain, the lab gets a treasure trove of data to develop new lines of business. Over time the gap between what you got and what they got widens, and the side that hosts the model has the advantage of leveraging the difference.

If you ran a small advertising firm and your largest client started a competing advertising firm using everything they learned about your industry from working with you, you would call that what it is. You would not call it innovation. 

What customers were never asked

What completes this picture is precisely what the procurement memo leaves unsaid.

Most businesses that send their customer interactions into a closed-model API are doing so on behalf of customers who never agreed to that. The terms of service the customer accepted, in most cases, said the business would handle their data responsibly. It did not say responsibly, except for when a third party uses it to train an AI model that will be offered back to anyone who wants to compete with the business they hired.

If the customer knew that was the trade, would they still consent?

In 2026, that question is one a thoughtful business should be asking itself. The legal answer might be that the consent is wide enough to cover it. The ethical answer may be different, and it is the one a lot of customer-trust-driven businesses are having with themselves quietly.

The architectural answer

Open weight models do not train on what you send them.

That is not a marketing line. It is the architectural fact of the open weight ecosystem in 2026. The weights of a published open weight model — Mistral, Llama, GLM, the production-grade set — are frozen at release. The model running in your environment, on hardware you control, learns nothing from what passes through it. There is no upstream pipeline. The vendor — and in a self-hosted deployment, the vendor is you — has no mechanism for collecting it. It is not a question of trust. It is a question of architecture.

If you fine-tune the model on your own data, the resulting model lives where you put it. It is your model, with your data inside it. No one has access to the refined model unless you give it to them.

This single difference is what makes open weights the ethically clean answer to the question we posed in our opening.

It is also why we believe so deeply in open source AI as the default for businesses that take their customer relationships seriously. The capability gap between open and closed AI has closed. The cost gap has flipped in favour of open source. The remaining gap — the only one that still favours the closed labs at scale — is the convenience of someone else running the infrastructure for you.

We think that is a gap worth closing for a reason that is no longer just about price. It is worth closing because the alternative is to keep handing the lever of your business to a counterparty whose own interests will not always align with yours.

How customers can tell

In our experience, the version of this argument that finally shifts the conversation is neither the architectural nor the legal one—it is the one grounded in the customer.

Customers notice. Perhaps not on the first encounter, but eventually. They notice when a company's AI feels off-the-shelf, when it echoes the same chatbot they met at a competitor the week before, when its answers are accurate in the abstract and foreign in the particular. They notice when the fabric of the relationship begins to wear thin.

The companies that come out ahead in this cycle, the ones that compound rather than coast, will be those whose AI interactions feel like them, not like a frontier lab borrowed for the afternoon. That distinction is hard to build on top of a stack you do not own.

It is straightforward to build on top of a stack you do own.

What BasedAI is building, and why

We started BasedAI because we believe open source AI now has the capability to have material impact at the business layer—not merely at the research layer.

Hirebase, our first product, is the operator layer. AI hires curated to perform specific work, who join your team where they already work, that produce work your team can review before it goes out. Closed beta is open at hirebase.co for solopreneurs and small teams today. We are starting design-partner conversations now with the next-tier companies whose teams want a similar relationship to the technology, at scale.

BasedAPIs, our second product, is the inference layer. Open weight models served reliably for production workloads, designed so that the answer to who learns from this is your own systems, on your terms, and no one else. Coming soon. If you want to be early to the closed alpha, reach out at api@basedai.co.

The third pillar of the company is consolidation — bringing fragmented pieces of the open source AI ecosystem into a coherent platform. Most of that work happens out of sight. What you will see in 2026 and 2027 is more products like Hirebase and BasedAPIs, packaged in a way that makes open source AI accessible and commercially ready..

We did not build the company because closed AI is evil. The closed labs employ a lot of careful engineers who care about a lot of the same things we do. We built BasedAI to address the structure of the closed-model deal which potentially leaves the business and their customers holding the wrong end of a long lever. Open weight infrastructure flips the lever back.

The decision to move to open source AI used to be a cost decision and then it became a capability decision. In 2026 it is becoming an ethics decision. The companies shift toward open source AI first are going to look, in five years, like the companies that took customer privacy seriously in 2014 — early,, and very obviously right in hindsight.

If you are having similar conversations internally within your team, we would like to hear from you.