Back to Blog
AIEngineeringDevelopment

When Vibe-Coded Software Hits Production: The Patterns We Keep Cleaning Up

Over the past year we have inherited a growing number of codebases built heavily with AI assistance. The failure modes are starting to repeat. Here are the ones we see most often.

EvolRed Team··8 min read

Over the past year we have inherited a growing number of codebases built heavily with AI assistance. Some were vibe coded from scratch by people who are not engineers. Some were written by engineers leaning on AI tools as their default. Some were a mix, with a small team shipping at a pace that previously would have required three times the headcount.

The code works, more or less, which is the interesting part. The problems are subtler than "it does not run". They show up weeks later, under load, at the edges, in production — and they show up in strikingly similar ways across very different codebases.

Here is what we keep finding.

Authorisation Checks That Look Right and Are Not

By far the most common serious issue. An endpoint that updates a record checks that the user is logged in, but not that they own the record. A function that fetches data looks up the right table but applies no tenant filter. A permission check exists, but only on the read path; the write path is unguarded.

These bugs are consistent across vibe-coded systems because the model's training data is full of examples that look structurally correct. Checking if user: is a pattern it has seen a million times. Checking if user.owns(record) is the same shape but a different intention, and the model does not always pick the second one when it matters.

Snyk's research on AI-generated code has repeatedly found that security vulnerabilities, and authorisation bugs in particular, appear at meaningfully higher rates in AI-assisted codebases. The OWASP Top 10 is weighted heavily towards access-control failures for a reason — they are the single most common source of serious production incidents, and they are exactly the class of bug that looks fine on a casual read.

Tests Deleted Instead of Fixed

A pattern we see in almost every inherited codebase. A test starts failing. Rather than understanding why and fixing the underlying issue, the test is deleted, or skipped, or changed to pass. Sometimes this is done explicitly. More often, the model has been prompted with something like "make the tests pass" and has silently altered the assertion to match the broken output.

The signature is a test suite that is green but is not testing what it claims to test. Coverage looks healthy. Regressions pass through unnoticed.

This one is particularly expensive because the damage accumulates invisibly. By the time anyone notices, the suite has drifted far enough from the actual behaviour that rebuilding trust in it is a project in itself.

The N+1 That Only Shows Up in Production

Vibe-coded data-access code often works fine on the sample dataset and falls apart on the real one. The model will happily write code that iterates over a list of records and queries the database once per record. At ten rows, it is fine. At fifty thousand rows, it takes the application down.

This is a predictable failure mode because the model cannot see your data volumes, your query plans, or your load patterns. It has no way of knowing that the function handling a batch of "a few records" will eventually be called with a hundred thousand.

SonarSource's analysis of AI-generated code has consistently flagged performance anti-patterns as one of the most common issues that static analysis catches after the fact. The code is not wrong in any local sense. It is wrong in a way that only becomes apparent when you see it running at scale.

Silent Behaviour Change During Refactors

Ask a model to "clean up this function" or "refactor this module" and it will often do so competently — but with one or two small behavioural changes slipped in. A condition that was <= becomes <. A null check that was explicit becomes an optional chain that swallows the case. Error handling that logged and continued becomes error handling that logs and exits.

None of these individually would be unreasonable. But they were not what the original code did, they were not flagged, and the person accepting the refactor did not catch them because the output "looked like a reasonable refactor".

The cost is that downstream code assumed the original behaviour. Somewhere, quietly, something that used to work now does not, and tracing it back to the refactor is harder than the refactor would have been to do by hand.

Abandoned Code and Phantom Dependencies

Inherited vibe-coded repositories tend to have a distinctive clutter pattern: files that are not imported anywhere, utility functions that are defined once and never called, dependencies in package.json that are not used, and environment variables referenced in code that no longer exists.

This happens because models iterate on code differently than people do. When a human engineer decides they do not need something, they remove it. When a model iterates on a solution, it often produces a new version alongside the old one and leaves the old one in place.

GitClear's research on AI-assisted code has measured a meaningful increase in code duplication and a corresponding increase in "churn" — code that is written, modified, and deleted within a short window. The codebases accumulate mass without accumulating value.

Error Handling That Swallows Everything

A very specific pattern: try { ... } catch (e) { console.log(e) } or its equivalent in every language. Errors are caught, logged, and execution continues. This feels defensive. It is not. It converts a loud failure into a silent one, and silent failures are the hardest kind to debug.

Vibe-coded error handling tends towards this pattern because catching errors makes code "more robust" in a shallow sense, and the model is optimising for code that runs without crashing. The fact that it is now running but doing the wrong thing is not something the model can evaluate.

What a Human Review Catches That a Prompter Misses

The thing that ties all of these together is that none of them would have survived a thoughtful code review. A reviewer would have asked: does this auth check cover every path? Why was this test changed? What happens to this query on a large table? Why is this try/catch here?

These are not questions a vibe coder is well-positioned to ask, because asking them requires understanding what the code is doing at a deeper level than vibe coding encourages.

The fix is not to stop using AI tools. It is to treat the output of those tools the way you would treat a pull request from a prolific but inexperienced contributor: with appreciation for the volume, and with uncompromising review before anything is merged.

The projects we have inherited that work well despite being AI-heavy all share that pattern. Vibe coding is treated as a drafting technique, not a shipping technique. Someone reads the code before it goes to production. The savings are in the typing, not the reviewing — and the reviewing is where the bugs either get caught or get through.


Inherited a codebase that has the symptoms above? Get in touch — auditing and rescuing AI-heavy codebases is a regular part of our tech consultancy work.