You type a question into ChatGPT. Maybe you're asking about a weird symptom at 2 a.m. Maybe you pasted in a messy divorce settlement to ask what it means. Maybe you're venting about your boss or asking for help with a job application that has your address, phone number and work history in it.
You get a helpful answer. You close the tab. Maybe you delete the chat for good measure.
That's where most people think the story ends. It doesn't.
Your text just entered a pipeline with at least seven stages, each with its own risks. It gets stored on a server. It may get read by a human. It may train a future model. Pieces of it may get memorized in ways that can be pulled out later. And as of 2025, it can be kept forever by court order, even after you delete it.
This piece walks through that pipeline from start to finish. Not the marketing version. The actual, sourced, current-as-of-early-2026 version of what happens to your words after you hit Enter.
Stage 1: Your text leaves your device
This part is straightforward but worth stating, because it's the part most people skip past without thinking about.
When you type into ChatGPT, Claude, Gemini or Copilot, your input leaves your computer and travels to the company's servers. Every word. The full prompt. Including whatever you pasted from your clipboard three seconds before hitting Enter.
This is true for every AI tool. There are no exceptions. The AI doesn't run on your laptop. It runs on computers owned by OpenAI, Google, Anthropic or Microsoft, and your text has to travel there for the system to work.
That means anything on your clipboard at the moment you paste goes to their servers too. The home address in that email you copied. The credit card number in that spreadsheet. The Social Security number in that tax form. Not because anyone is trying to steal it. Because that's how the design works.
The data is scrambled while it's traveling to their servers, so nobody can intercept it in between. (The technical term is "encrypted in transit.") That's the good news. The question is what happens once it arrives.
Stage 2: It sits on a server (longer than you think)
Every major AI company stores your chats on their servers. How long depends on the company, your plan and, in one case, a federal judge.
| Platform | Free tier | After deletion | Business tier |
|---|---|---|---|
| ChatGPT | Forever (until you delete) | Purged within 30 days | Zero Data Storage option |
| Claude | 5 years (if training on) / 30 days (if off) | 30 days | Unaffected by consumer policy |
| Gemini | Up to 18 months (adjustable) | 72 hours if tracking off | Separate controls, no training |
| Copilot | Microsoft standard privacy terms | Varies | IT admin controls |
ChatGPT (OpenAI): For free and Plus users, chats are stored forever unless you manually delete them. If you delete a chat, OpenAI says it's purged within 30 days. Temporary Chat mode also auto-deletes within 30 days. Business and API customers with Zero Data Storage agreements get true deletion: nothing is kept after the session ends.
Claude (Anthropic): Until August 2025, Anthropic deleted free-tier chats within 30 days by default. Then the policy changed. Users on Free, Pro and Max plans who allow their data to be used for training now face a five-year storage window. Users who opt out keep the 30-day window. Business, Government, Education and API customers are unaffected.
Gemini (Google): Consumer Gemini chats are tied to your Google Account and stored for up to 18 months by default, adjustable to 3 or 36 months. If you disable activity tracking entirely, Google says it retains chats for up to 72 hours. Workspace paid users get separate controls and no-training guarantees.
Copilot (Microsoft): The work version of Copilot for Microsoft 365 follows whatever rules your company's IT team sets up. The free consumer version follows Microsoft's standard privacy terms, which most people never read.
The important thing here is that "stored on a server" means exactly what it sounds like. Your chat exists as a readable record in a data center. It can be accessed by employees doing safety reviews. It can be demanded by a court. It can be leaked in a breach.
The training question gets all the attention. The storage question is where the immediate risk lives.
Stage 3: Humans may read it
Every major provider reserves the right to have human reviewers read your chats for safety, quality and abuse detection.
Google's Gemini privacy hub says it plainly: interactions may be reviewed by trained human reviewers, including those working for service providers. OpenAI uses human reviewers for safety monitoring and model evaluation. Anthropic uses a combination of automated systems and human review.
The odds of any single chat being read by a person are low. But the chance is not zero. And the chats most likely to be flagged for review are the ones that trip the company's automated filters: medical questions, legal scenarios, content that looks like it might involve harm or misuse.
In other words, the chats you'd most want to keep private are the ones most likely to get a second pair of eyes.
Business tiers mostly exclude human review. OpenAI's Business and API products, Anthropic's business plans and Google's Workspace-integrated Gemini all commit to no human review of customer content. But the free versions make no such promise.
Stage 4: It may train a future model
This is the part everyone asks about. So here's how it actually works.
What "training" means
The short version: your chats enter a massive pool with millions of others. The model processes that pool to learn patterns, not to memorize individual conversations. How people phrase questions. What makes an answer useful. How language works in different contexts. (We wrote a full explainer on what AI training actually means if you want the deeper version.)
That's the official story. It's mostly true. But the word "mostly" matters, and we'll get to why in Stage 5.
Who trains on what right now
The training picture shifted sharply in 2025. As of early 2026, all four major platforms (ChatGPT, Claude, Gemini and Copilot) train on free-tier conversations by default. All four let you opt out. All four exclude business and enterprise customers. The opt-out exists, but it's buried in settings menus, and the default favors the company.
We wrote a step-by-step opt-out guide covering exactly where to find the toggle on each platform and what changes when you flip it.
What opt-out actually does (and doesn't)
Opting out of training stops your future chats from entering the training pipeline. It does not:
Opt-out does
- Stop future chats from entering the training pipeline
- Reduce your data's exposure going forward
Opt-out doesn't
- Delete conversations already on their servers
- Remove patterns already absorbed by a trained model
- Prevent retention for safety review or legal compliance
- Stop a court from ordering data preservation
Opting out of training is worth doing. But it's not the same as privacy. Your data still exists on their servers, still subject to their storage policies, still accessible under subpoena.
Stage 5: The model may memorize it
Here's where the official story gets complicated.
AI companies say the model "learns patterns, not specifics." That's the standard reassurance. And for most data, it's accurate. The model doesn't keep a copy of your conversation. It absorbs the statistical relationships between words and moves on.
But researchers have proven, again and again, that large language models can and do memorize specific pieces of training data. Not just patterns. Actual verbatim text.
The research
The original alarm came in 2023 when Google DeepMind researchers extracted over 10,000 verbatim training examples from ChatGPT for about $200. (We covered this in our training explainer.) OpenAI patched that specific attack. But the core dynamic hasn't changed.
What's happened since is worse. In 2024, researchers showed that fine-tuning APIs could undo a model's safety training and reactivate memorized content. A 2025 paper introduced a framework for steering models into high-uncertainty states where they regurgitate memorized text. And a December 2025 survey confirmed that memorization scales with model size: bigger models memorize more data, memorize it faster and are more vulnerable to extraction.
Each new generation of model gets bigger. Each one memorizes more. The attacks keep evolving faster than the patches.
What gets memorized
Data that appears only once or twice in the training set is more likely to be retained word for word than common patterns. Your tax return is more memorable to a model than a question about the weather. This is exactly backwards from what you'd want: the most personal, one-of-a-kind data is the data most likely to stick.
What this means practically
The risk isn't that ChatGPT will spontaneously recite your conversation to a random user. That's extremely unlikely. The risk is that a dedicated attacker with the right techniques can probe a model and extract training data that should never have been there. And that the boundary between "learning patterns" and "memorizing text" is blurrier than any company will say in a marketing page.
If your chat was used for training (and you didn't opt out), pieces of it may exist inside the model in a form that is, under the right conditions, pulled back out. Not probably. Not theoretically. Proven, according to peer-reviewed research from Google's own labs.
Stage 6: It can be used as evidence against you
This is the newest and possibly most consequential part of the pipeline.
The Heppner ruling
On February 10, 2026, a federal judge in New York ruled that 31 documents created using consumer Claude were not protected by attorney-client privilege. The reasoning: Anthropic's privacy policy allows data sharing with third parties, including the government. The platform's terms of service killed the legal protection. (We covered the full ruling and its implications in a separate post.)
This ruling applies to the legal profession directly, but the core principle is broader. When you type into an AI tool, the data is governed by that company's terms, not by your intentions. If those terms allow disclosure, your data can be disclosed.
The NYT preservation order
In May 2025, a federal judge ordered OpenAI to preserve all ChatGPT chat logs forever as part of the New York Times copyright lawsuit. This included conversations users had already deleted. OpenAI's 30-day deletion promise was overridden by judicial order.
The preservation order covered ChatGPT Free, Plus, Pro and Team users. Business-tier customers and API users with Zero Data Storage agreements were excluded. OpenAI successfully challenged the indefinite retention in September 2025, restoring standard deletion for new conversations going forward. But for chats from April through September 2025, the data remains locked in a separate system that OpenAI's legal team can access.
The New York Times then demanded access to 20 million ChatGPT conversations to search for evidence that users were circumventing its paywall. OpenAI has fought this demand, but the precedent is set: your AI conversations can be made the subject of litigation, even litigation you have nothing to do with.
The 300 million message leak
In January 2026, a security researcher found that Chat & Ask AI, a popular app that lets you talk to ChatGPT, Claude and Gemini from one place (with over 50 million users), had left its entire database wide open on the internet. No password needed. Anyone who knew where to look could read everything. Three hundred million messages were sitting there, exposed. The data included suicide notes, workplace secrets, illegal activity and personal relationships.
The AI companies' servers were fine. The third-party app that sat on top of them had no protection at all. People who thought they were talking to "ChatGPT" were actually talking through a middleman that stored everything in an unlocked database.
This is a critical point. Even if OpenAI, Anthropic and Google run perfect security (which no company does forever), your data's safety depends on every link in the chain. The app you used to access the AI. The browser extension. The Slack bot. The "AI-powered" tool that calls an API behind the scenes. Each one stores your data under its own rules, on its own servers, with its own level of care.
Stage 7: Agents make everything worse
Everything above applies to the simple case: you type, the AI responds, the chat is stored. But the AI industry is moving hard toward agents, systems that don't just answer questions but take actions. They browse the web. They read your email. They execute code. They interact with other tools on your behalf.
OpenAI published blog posts in early 2026 acknowledging that prompt injection attacks against AI agents like ChatGPT Atlas "may never be fully solved." They demonstrated an attack where a malicious email caused the Atlas agent to send a resignation letter to the user's CEO instead of drafting an out-of-office reply.
When an agent reads your email, opens your files and checks your calendar, the scope of what's exposed isn't just what you typed into a chat box. It's everything the agent can see. And a trick called "prompt injection" (hiding secret instructions inside a web page, a document or an email that the AI reads) can hijack what the agent does next. The AI can't tell the difference between real tasks and planted ones.
In February 2026, the European Parliament disabled AI features on every lawmaker's device because its IT team couldn't guarantee where the data was going. The body that wrote the EU AI Act concluded it couldn't yet trust AI tools on its own hardware. If they can't figure out where their data ends up, most people can't either.
The enterprise gap
You've probably noticed a pattern in everything above. Free plans get trained on, retained longer, read by humans and preserved by court order. Enterprise plans mostly don't.
This isn't a coincidence. It's a two-tier system. The paying business customer gets real privacy guarantees: no training, shorter storage, the option to have nothing kept at all and signed agreements for health data (called BAAs, which HIPAA requires). The free user gets defaults that favor the company, opt-outs buried in settings and storage rules that can change with a terms-of-service update.
The gap creates a false sense of safety. Say someone at a law firm uses ChatGPT's paid work version all day with full data protections. They go home, open the free version on their phone, paste a client document in for a quick look. Same tool. Same screen. Totally different rules for what happens to that data. The work protections don't follow them home.
Worse, 47% of enterprise AI users are still accessing AI tools through personal accounts according to Netskope's 2026 Cloud and Threat Report. The paid tier exists. Nearly half the people in work settings aren't using it.
What you can actually do
None of this means you should stop using AI tools. They're too useful for that argument to land. But using them with your eyes open changes what you paste in and where you paste it.
Opt out of training on every tool you use. It takes a few minutes per tool and it's the single most impactful step. Our step-by-step guide covers ChatGPT, Claude, Gemini and Copilot.
Treat the paste as the risk. The moment you hit Ctrl+V (or Cmd+V on a Mac) is the moment data leaves your control. Before you paste, glance at what's on your clipboard. Is there a phone number in that email? A bank account number in that PDF? Your kid's name and school in that form you were filling out?
Know which tier you're on. Paid work plans with signed privacy contracts offer real protections. Free plans offer defaults designed for the company's benefit. If your employer pays for a work AI plan, use it. If you're on a personal account doing anything sensitive, understand the tradeoff.
Assume it's being kept. Your chat sits on a server somewhere for at least 30 days, often much longer. A court can order the company to keep it forever. A breach at the company or any third-party app in the chain can expose it. The safest data is the data that never left your device.
BeatMask catches sensitive data before it's submitted to any AI tool. On your device, before anything reaches a server. Nothing leaves. Nothing is logged.
The safest data is the data that never left your device.