We Scanned 100 AI-Built Apps. Here's What They All Missed.
We analyzed 100 repos built with Cursor, Lovable, Bolt, and v0. The results reveal a consistent pattern: AI tools build fast but leave critical gaps. Here's the data.
We scanned 100 repositories built with AI coding tools. Not toy projects or tutorials, real applications that their builders intended to ship to real users. SaaS products, marketplaces, dashboards, internal tools. The kind of thing you'd build over a weekend with Cursor or Lovable and then want to put in front of customers.
The results tell a clear story: AI tools are excellent at building. They are consistently poor at finishing.
Here's what we found.
Methodology
We analyzed 100 publicly accessible repositories, selected to represent the range of what people build with AI tools:
- 38 repos built primarily with Cursor
- 29 repos built with Lovable
- 21 repos built with Bolt
- 12 repos built with v0 (typically combined with manual development)
Each repo was scanned across six categories: security, error handling, testing, deploy configuration, performance, and UI/UX completeness. We looked at the code as-is, not what it could become with more work, but what was actually committed and in many cases deployed.
The Top-Line Numbers
| Finding | % of Apps Affected |
|---|---|
| Missing error boundaries | 78% |
| No test files of any kind | 72% |
| At least one exposed secret or API key | 31% |
| Missing rate limiting on public endpoints | 89% |
| No input validation on forms | 64% |
| Missing loading states on async operations | 58% |
Build configured with ignoreBuildErrors | 34% |
| No error tracking or monitoring setup | 91% |
| Missing security headers | 85% |
| No database migration files | 67% |
These percentages represent the apps as we found them. Some may have addressed these issues before deploying to production. But the gap between "code in the repo" and "production-ready" is exactly what this research measures.
Security: The Biggest Gap
Security was consistently the weakest category across all tools and all project types.
31% of apps had at least one exposed secret. These ranged from Supabase service role keys in client-side code to Stripe secret keys in environment files committed to the repo. In some cases, the keys were still active and provided full access to the project's database or payment infrastructure.
89% had no rate limiting on any public-facing endpoint. This means authentication flows, form submissions, and API routes could be called unlimited times per second. For apps with email-sending capabilities (password reset, notifications), this is an invitation for abuse.
64% had no input validation on user-facing forms. Whatever a user typed was sent directly to the backend without checking type, length, format, or content. In apps using raw SQL or string interpolation, this created potential injection vectors.
Security by Tool
| Security Issue | Cursor | Lovable | Bolt | v0 |
|---|---|---|---|---|
| Exposed secrets | 18% | 41% | 38% | 25% |
| Missing RLS/auth on data access | 21% | 52% | 43% | 33% |
| No rate limiting | 82% | 93% | 95% | 92% |
| Client-side authorization only | 13% | 62% | 57% | 42% |
Cursor performed best on security, which makes sense: Cursor users tend to be experienced developers who know to ask for security measures. But even among Cursor projects, the baseline was low. Most developers didn't ask for rate limiting or comprehensive auth checks, so they didn't get them.
Lovable had the highest rate of exposed secrets and missing authorization. This aligns with the May 2025 audit finding that 10% of Lovable apps had exploitable vulnerabilities. The tool generates complete apps quickly but doesn't apply security defaults that match what production requires.
Error Handling: The Silent Failure Mode
78% of apps had no error boundaries. When a JavaScript error occurs in a React component, the entire application crashes to a white screen. Error boundaries catch these errors and display a fallback UI. Without them, a single rendering error in a single component takes down the whole app.
58% were missing loading states on asynchronous operations. Buttons didn't disable while processing. Lists didn't show loading indicators while fetching. This leads to double-submissions, user confusion, and the perception that the app is broken.
44% had at least one async operation with no error handling at all. No try/catch, no .catch(), no error callback. If the API returns an error or the network is down, the app either silently fails or crashes.
// This pattern appeared in 44% of scanned apps
const { data } = await supabase.from('items').select('*');
// What if this fails? No error handling. No loading state.
// The user sees nothing, or the app crashes.Testing: Nearly Nonexistent
72% of apps had zero test files. No unit tests, no integration tests, no end-to-end tests. Nothing.
Among the 28% that did have tests:
- 18% had only a single default test file (the one generated by
create-next-appor similar scaffolding) - 7% had meaningful tests covering some business logic
- 3% had comprehensive test coverage across auth, API, and core features
This finding isn't surprising but it is alarming. These apps handle user data, process payments, and manage authentication. They're deployed to production with no automated way to verify they work correctly.
The absence of tests isn't just a code quality issue. It's a deployment risk. Without tests, every deployment is a manual QA session. If you're shipping updates regularly (and you should be), you're manually testing every critical flow every time, or you're not testing at all.
Deploy Configuration: Papered Over
34% of apps had ignoreBuildErrors enabled in their build configuration. This flag tells the build tool to skip TypeScript errors and ESLint warnings, producing a deployable artifact even when the code has known issues.
// Found in 34% of next.config.js files
const nextConfig = {
typescript: {
ignoreBuildErrors: true, // This hides real problems
},
eslint: {
ignoreDuringBuilds: true, // So does this
},
};This is the code-generation equivalent of putting tape over your car's check engine light. The problems don't go away. You just stop seeing them.
67% had no database migration files. The database schema was created manually through a dashboard UI, with no way to reproduce it, version it, or roll it back. If the database needs to be recreated (new environment, disaster recovery, team member onboarding), there's no automated way to do it.
85% were missing security headers. No HSTS, no X-Frame-Options, no Content-Security-Policy. These are one-time configuration items that protect against entire categories of attacks, and they're almost universally absent from AI-generated apps.
The Score Distribution
We scored each app on a 0-100 scale across all categories. The distribution was striking:
- 0-20 (Critical): 23% of apps. Multiple security vulnerabilities, no tests, no error handling. Not safe to deploy.
- 21-40 (Poor): 38% of apps. Some basics covered but significant gaps in security and reliability.
- 41-60 (Fair): 27% of apps. Core functionality works but missing production hardening.
- 61-80 (Good): 10% of apps. Most categories addressed, some gaps remaining.
- 81-100 (Production Ready): 2% of apps. Comprehensive coverage across all categories.
Only 2% of the apps we scanned were production-ready as committed. The median score was 31 out of 100.
The correlation between tool and score was notable but not as strong as you might expect:
| Tool | Median Score | Range |
|---|---|---|
| Cursor | 42 | 12-87 |
| Lovable | 24 | 5-61 |
| Bolt | 21 | 8-54 |
| v0 | 35 | 15-72 |
Cursor projects scored highest on average, but the variance was enormous. The best Cursor project scored 87. The worst scored 12. The tool amplifies whatever the developer brings to it, for better and for worse.
What This Means
These findings don't mean AI coding tools are bad. They mean AI coding tools are building tools, not shipping tools. They're optimized for getting from zero to working prototype as fast as possible, and they're extraordinary at that job.
But working prototype and production-ready application are not the same thing. The gap between them is consistent, predictable, and measurable. It shows up in the same categories every time: security, error handling, testing, deploy configuration.
The builders who successfully ship aren't the ones with the best AI tools. They're the ones who recognize that the AI-generated code is a starting point and systematically address what it misses. The vibe coding approach works, but only when it's paired with a finishing pass.
The Path Forward
If you've built something with any AI tool and you're planning to ship it, the data suggests you should assume the following gaps exist until you've verified otherwise:
- Security: Your auth and authorization have gaps. Your secrets may be exposed. You have no rate limiting.
- Error handling: Your app will crash or silently fail on errors. Users will see broken states.
- Testing: You have no automated way to verify your app works correctly.
- Deploy config: Your build may be hiding errors. Your database schema isn't reproducible.
- Monitoring: When things break in production, you won't know until users complain.
You can address these manually with a production checklist. Or you can automate the audit.
FinishKit was built specifically for this gap. It scans your repo with multi-pass LLM analysis and generates a prioritized Finish Plan covering every category where AI-built apps consistently fall short. The scan takes about 2 minutes. See what your app scores.
The AI built your app. Now find out what it missed.