Why AI Optimizes for 'It Works,' Not 'It's Secure'

Why is AI code insecure? Because models optimize for code that runs and demos well, not code that's safe. The mechanism behind AI code security problems, explained.

Barret8 min read

The AI built exactly what you asked for. The feature works. The preview looks finished. So why does every security article insist your app is probably full of holes? It feels contradictory — if the code were bad, surely it wouldn't run?

That's the trap, and it's worth understanding why. AI code is insecure not because the model is incompetent, but because of what it's optimized to do. It's trained and rewarded to produce code that works and demos well. Security is a separate property the model isn't aiming at — and one that's invisible in a working preview.

This post explains the mechanism: why "it works" and "it's secure" are different things, why AI reliably nails the first and misses the second, and what that means for how you should treat anything an AI builds.

⚡ TL;DR

  • AI is optimized to produce code that runs and looks done. Security is invisible in a working demo, so it gets left out unless you explicitly ask for it.
  • Models learn from real-world code, which is full of insecure shortcuts — so they reproduce those shortcuts by default.
  • The practical takeaway: "it works" is not a security signal. The only way to know an app is safe is to test the deployed version.

"Works" and "secure" are not the same property

When you ask an AI to "let users save their profile," there's a clear, testable target: after the prompt, can a user save a profile? The model can aim straight at it, and you can instantly see whether it hit. Success is visible.

Security is the opposite. A secure version of that feature and an insecure version look identical in the preview. Both let a user save a profile. The difference only shows up when someone who isn't supposed to — an anonymous stranger, a different logged-in user — tries to read or change that profile. Nothing in the demo exercises that path. So the insecure version passes every check you can see, and the hole stays invisible until someone goes looking.

Put plainly: "it works" measures the happy path. Security measures the unhappy paths — the ones an attacker takes. A working demo tells you nothing about those, because nobody ran them.

The model is optimizing for the demo

AI coding tools are tuned, end to end, to make you successful in the moment. You prompt, it generates, the preview updates, the feature works, you're delighted. That feedback loop rewards one thing above all: a result that visibly functions.

Security doesn't fit that loop. Adding a proper access rule, locking down a storage bucket, or writing a server-side permission check makes the demo no better — and occasionally makes it briefly worse (a freshly secured database returns nothing until you add policies, which can look like a bug). There's no reward signal pushing the model toward the secure choice, and a small one pushing it away. So it builds the version that demos cleanest.

This is why the "make it work" style of prompting consistently produces the insecure shortcut. Ask for the feature and you get the feature — implemented the fastest, most demo-friendly way, which is often the least safe way. The classic example is a database access rule written as "allow everyone" (USING (true) in Supabase terms): it makes the feature work instantly in the preview and leaves the table wide open. The model isn't being reckless. It's giving you exactly what the prompt rewarded.

Models learn insecure patterns from their training data

There's a second force at work. AI models learn to code by absorbing enormous amounts of real-world code. And real-world code is full of insecure patterns — quick examples, tutorials that skip security "for clarity," Stack Overflow answers that solve the immediate problem and ignore the rest.

When a model has seen the insecure shortcut thousands of times and the secure version far less often, it reproduces the shortcut. It's reflecting the average of what people actually wrote, and the average is not secure. The model didn't invent the bad pattern. It inherited it from us.

This is also why the problem doesn't fix itself with scale. Security isn't an accident the model will grow out of — it's baked into both the training data and the optimization target.

There's no threat model unless you ask for one

A human developer building a login feature carries an invisible checklist: What if someone guesses another user's ID? What if they replay this request? What if they're not logged in at all? That checklist is a threat model — a habit of imagining the attacker.

The AI has no such habit unless you put it in the prompt. Left to its own devices, it builds for the cooperative user who does exactly what the feature expects. It does not, on its own, imagine the person trying to break it. So the defenses against that person — permission checks, rate limits, locked-down rules — never appear. Not because the model couldn't write them, but because nothing asked it to think about the attacker.

You can prove this to yourself: ask an AI to build a feature, then separately ask "how would an attacker abuse this?" It can often answer the second question well. It just doesn't ask itself that question while building, because that's not what it's optimized to do.

The research says this isn't getting better

This isn't a hunch — it's measured. Independent testing keeps landing on the same conclusion.

Veracode's 2025 GenAI Code Security Report tested over 100 large language models across 80 coding tasks. About 45% of the code samples failed security tests or introduced an OWASP Top 10 vulnerability. The report's blunt summary, from CTO Jens Wessling: "Our research reveals GenAI models make the wrong choices nearly half the time, and it's not improving." That last clause is the important one. More capable models write code that works better — not code that's safer.

Carnegie Mellon's SusVibes benchmark found the same split from another angle. Even the best setup reached strong functional results but passed security checks on only a small fraction of tasks — meaning the large majority of functionally correct solutions still contained vulnerabilities. The code worked. It just wasn't safe. That gap — correct but insecure — is exactly the "works isn't secure" problem, quantified.

For a deeper walk through these numbers, see the research on AI-generated code.

How to check your own app

Once you accept that "it works" isn't a safety signal, the path forward is clear: you have to look at the parts the demo never exercised.

  1. Test the unhappy paths. Don't just use your app as intended. Try to read another user's data by changing an ID. Try to reach a paid feature without paying. Try to call your API logged out. These are the paths the AI didn't build for.
  2. Inspect the rules, not the behavior. A feature working tells you the rule lets you through. Check whether it stops everyone else: open your database access policies, your storage bucket settings, your endpoint permissions.
  3. Look at what ships to the browser. Open the network tab and your JavaScript bundle. Anything an attacker can see — keys, endpoints, logic — is part of your attack surface, regardless of how clean the demo looked.

Worried "it works" is hiding something?

Run a free, read-only scan of your live app — no install, results in under a minute.

Scan my app free →

What this means for how you build

You don't have to stop using AI tools. You have to stop trusting the wrong signal.

The mental shift is this: treat a working preview as a starting point, not a finished one. The AI got you a feature that runs. It did not, and was never trying to, get you a feature that's safe against someone attacking it. Those are two different jobs, and the second one is still yours.

In practice that means asking explicitly for the threat model — "now make sure a stranger can't read another user's rows" — and then verifying the deployed app rather than taking the AI's word for it. Asking helps. But the model that wrote the insecure version is not the most reliable judge of whether it fixed it. The deployed app is the source of truth, and the only honest test is to probe it the way an attacker would. That's also why this is continuous: every new prompt ships new code, and new code reopens the same gap.

FAQ

Does this mean AI-generated code is always insecure?

No — it means security is not guaranteed and not the default. Plenty of AI-generated code is fine, especially when you prompt for security explicitly and verify the result. The point is that you can't assume it's safe just because it works. The base rate of vulnerabilities is high enough (around 45% of samples in Veracode's testing) that "probably fine" isn't a safe bet for a real app with real users.

Won't better models fix this on their own?

The evidence says no, at least not automatically. Veracode's finding that it's "not improving" reflects that newer models optimize for capability, not safety. A smarter model writes a slicker feature, but it still isn't imagining the attacker unless you make it. Better models raise the floor on "works," not on "secure."

If I just prompt for security, am I covered?

It helps a lot, but it's not a guarantee. Prompting for specific protections — real access rules, server-side permission checks, locked storage — gets you far better output than "make it work." But the same model that misses security can also miss it while claiming to add it. Prompt for it, then confirm by testing the live app. Trust, but verify.

The bottom line

AI code is insecure because the model is optimized for code that works and demos well, not code that's safe — and security is invisible in a working preview. It reproduces insecure patterns from its training data and doesn't imagine the attacker unless you tell it to. The research confirms this and says it isn't improving. So "it works" can't be your security signal. An attacker can find the gaps in minutes by testing the paths your demo skipped. You should find them first, on the deployed app, and keep checking, because every new prompt ships new code and new holes.

Find your gaps before an attacker does.

Is My Site Hackable? scans your deployed app for the exact issues in this article — exposed keys, missing RLS, open buckets — and tells you what's real and what's a false alarm.

Run a free scan →