Atwood — Where your data actually goes: training sets, retention, and leaked keys

Two quiet ways a careless AI rollout leaks your crown jewels — your prompts becoming training data, and your API keys escaping. Here’s how each happens, and how to stop it.

When an organization starts using AI, two questions rarely get asked until it's too late: does my data train someone else's model? and where are my keys? Both have specific, well-understood failure modes — and both are avoidable.

Risk 1 — your prompts become training data

On consumer and free tiers, the text you type in can be retained and used to improve the provider's model. That means your draft contract, your donor list, or your strategy memo can become part of a model you don't control — and potentially surface in someone else's output later.

Example: a staffer pastes a confidential acquisition memo into a free chatbot to "tighten the wording." On that tier, the content may be retained and used for training. The text has now left your control entirely, with no way to claw it back.

Paid and enterprise tiers are usually different — reputable API and enterprise products typically don't train on your data and offer zero-retention options. But "typically" isn't "automatically." The fix is to verify, not assume: use a no-train tier, turn off training on your data, sign a data-processing agreement (DPA), and strip PII before anything leaves your boundary. For the most sensitive work, run open models in your own environment so nothing leaves at all.

Risk 2 — retention, even without training

"Not used for training" is not the same as "not stored." Inputs may still be logged for a period for abuse monitoring or debugging. For regulated data, that retention window itself can be the compliance problem, training or no training. The answer is the same: zero-retention enterprise endpoints, and keeping sensitive data from leaving in the first place.

Risk 3 — leaked API keys

An API key is a password for your software. The common ways they leak are mundane and constant: hardcoded into code that gets pushed to a public (or even private) git repo, exposed in client-side browser code, printed into logs, or pasted into a chat. The consequences range from someone running up an enormous bill on your account to accessing whatever data the key can reach.

Example: a developer commits a key to a public GitHub repo "just for a minute." Automated bots scrape new commits continuously; the key can be found and abused within minutes, long before anyone notices.

The fix is basic key hygiene: keep keys server-side only, in a secrets vault — never in client code or git; scope each key to the least access it needs; rotate them regularly; and monitor usage for anomalies so a leak is caught fast.

The governed answer

Every one of these closes the same way: put one governed gateway between your data and the models. PII is stripped before egress, so sensitive data never reaches a public model. Keys are held server-side in a single vault, never in app code or a teammate's laptop. Sessions are ephemeral and zero-retention. Your data never enters a training set, and every request is logged so you can prove it. The crown-jewel workloads run on open models inside your environment.

The question isn't whether AI is safe to use — it's whether your data crosses a boundary it shouldn't. Control the boundary, and both the training-data risk and the key-leak risk disappear.

This is the concrete version of what breaks when you skip the governance layer — and exactly what a governed gateway is built to prevent.