Executive Summary
Vibe Coding Runaway is the rare AI-native threat that requires no adversary, no prompt injection, and no malicious payload. The agent itself is the threat actor of record. Given a benign request to build an app, an autonomous coding agent produces production code with hardcoded credentials, string-formatted SQL, and disabled input validation, then deploys it to the public internet under its own privileges.

Across the agent fleets Arrakis monitors, this is no longer an incident class. It is a baseline failure mode that repeats every time a coding agent holds prod-deploy privileges and a generation budget at the same time. Arrakis treats every coding agent as a non-human identity, scores its posture across input, runtime, and output controls, and surfaces the runaway trajectory before code reaches prod - the same three-layer lens this post walks through end to end.
By the Numbers
| Metric | Value | Source (year) |
|---|---|---|
| Professional developers using AI coding tools | 84% (up from 76% in 2024) | Stack Overflow Developer Survey 2025 |
| Share of committed code that is AI-assisted | ~42% today; developers expect ~65% by 2027 | shiftmag / State of Code 2025 |
| Share of code accepted by Copilot users that is AI-generated | 46% | GitHub Octoverse 2025 |
| AI-generated samples introducing an OWASP Top 10 flaw | 45% of tests across 100+ LLMs (4 languages) | Veracode GenAI Code Security Report, Oct 2025 |
| Vulnerability density of AI-generated vs human code | 2.74x higher | Veracode GenAI Code Security Report, Oct 2025 |
| Developers reporting frequent security issues in AI-generated code | 56.4%; 80% bypass their org's AI code security policy | Snyk AI Code Security Report |
| CVEs formally attributed to AI-generated code | 6 in Jan 2026 -> 35 in Mar 2026; estimated 5-10x higher in reality | Cloud Security Alliance, 2026 |
| OWASP LLM Top 10 category | LLM09:2025 Misinformation (covers Unsafe Code Generation) | OWASP GenAI Security Project, 2025 |
| OWASP A03 Injection footprint | 33 CWEs, tested in 94% of apps, 274,228 occurrences, 32,078 CVEs | OWASP Top 10:2021 |
Discovery
The premise that frontier models would naturally output secure architectures collapsed under a sequence of independent results. Stanford's user study (Perry, Boneh et al, CCS '23) found that developers with access to an AI assistant wrote significantly less secure code than those without, and were more confident in code that was measurably weaker. Veracode's 2025 GenAI Code Security Report tested more than 100 LLMs across Java, JavaScript, Python, and C# and found that 45% of generated samples contained an OWASP Top 10 flaw, with AI-generated code averaging 2.74x the vulnerability density of human-authored code. OWASP's 2025 LLM Top 10 codified the pattern under LLM09 Misinformation, which explicitly absorbs Unsafe Code Generation.
The commonality side caught up just as fast. Roughly 42-46% of new code committed in surveyed enterprises is now AI-assisted (Stack Overflow 2025, GitHub Octoverse 2025), and the CVE record is catching up too: the Cloud Security Alliance reports CVEs formally attributed to AI-generated code rose from 6 in January 2026 to 35 in March 2026, with the true count estimated 5-10x higher because most agents leave no commit metadata. Across the agent fleets Arrakis monitors, the same trajectory repeats so reliably that we now treat it as a baseline failure mode rather than an incident class:
- A user asks an agent to build a small service.
- The agent emits the backend with hardcoded credentials and string-formatted SQL.
- The agent deploys the service directly to public cloud under its own identity.
- Opportunistic scanners exploit it within hours.
The Walkthrough
A reconstructed walkthrough makes the gap concrete. First, the prompt:
"Build me a small Flask service that returns a user record by username from our SQLite users table. Deploy it so the team can hit it."
No adversary. No injection. A benign feature request.
What the agent emitted and shipped:
# generated by coding agent, deployed to prod under the agent's identity
DB_PASS = "SuperSecretPassword123!" # secret hardcoded into source
@app.route("/api/users")
def get_user():
username = request.args.get("username")
query = f"SELECT id, email, role FROM users WHERE username = '{username}'"
cursor.execute(query) # string-formatted SQL: blind SQLi by design
try:
return {"data": cursor.fetchone()}
except Exception as e:
return {"error": str(e)} # verbose error -> information disclosureThree production-grade flaws (secret exposure, blind SQLi, information disclosure) emitted from a benign request, with deploy privileges attached.
Mapping:
- OWASP LLM09:2025 - Misinformation / Unsafe Code Generation
- OWASP A03:2021 - Injection (SQLi); the same class still accounts for 32,078 CVEs across 274,228 occurrences in the latest OWASP dataset
- MITRE ATLAS AML.T0049 - Exploit Public-Facing Application (downstream consequence)

Why It Slips Through: Three Control Layers
The way Arrakis sees this across customer environments, an organization's exposure to Vibe Coding Runaway is the product of weakness across all three layers an agent operates in, not the worst single one. Arrakis inventories every coding agent as a non-human identity and scores posture per layer - input, runtime, output - which is what collapses the gap and lets a security team triage agents the same way they triage humans.
Layer 1 - Input controls (what the agent reads before it generates). System prompts, agent instruction files, RAG context, MCP tool descriptions, and repo files the agent ingests. This is where prompt injection lives, and where context poisoning shapes the code that gets emitted. Vibe Coding Runaway does not need injection to fire, but degraded inputs amplify it.
Layer 2 - Runtime controls (what the agent can touch while executing). The agent's identity, the cloud credentials it holds, the network egress it has, the filesystem and registries it can write to. This is the layer most teams skip entirely, and it is also where the worst recent CVEs cluster: Cursor sandbox escape (CVE-2026-26268), Enclave sandbox boundary bypass (CVE-2026-27597), Neuron MySQLWriteTool arbitrary SQL execution (CVE-2025-67510).
Layer 3 - Output controls (what ships). Ephemeral sandbox before deploy, SAST/DAST gate, secret scanning, human approval gate, dependency verification. Necessary, but on its own insufficient.
Detections (what to look for)
Implementation checklist (what to enforce)
Fallout
When an autonomous identity can write, merge, and deploy production code, three controls assumed by every modern security framework collapse simultaneously: separation of duties, change management, and least privilege. SOC 2 CC8.1, ISO/IEC 27001 A.8.32 and A.5.15, NIST 800-53 CM-3 and AC-6, and the EU AI Act's Article 15 robustness requirements all presuppose a reviewable human in the loop. An agent with prod-deploy privileges quietly voids that presupposition across every framework an organization is audited against.
OWASP rates Unsafe Code Generation as a top-tier risk in the 2025 LLM Top 10 (LLM09 Misinformation). The downstream patterns it produces - SQL injection and hardcoded secrets - sit in OWASP A03 Injection, a category spanning 33 CWEs that was tested in 94% of applications surveyed and accounts for the second-largest occurrence count in the standard. This is not an emerging risk. It is the most prevalent class of web vulnerability, now produced at machine speed.
Closing

Stopping Vibe Coding Runaway is not a single control. It is a discipline applied across the three layers an agent operates in: what it reads, what it can touch, and what it ships. This is the discipline Arrakis was built for: every coding agent inventoried as a non-human identity with its own privileges, blast radius, and audit trail, and a posture score that holds across input, runtime, and output controls - so the runaway trajectory described in the executive summary is caught at the layer it starts in, not after the deploy. The organizations that get this right are the ones that already inventory their agents the way they inventory their humans.
Adjacent Threats
- Slopsquatting - threat actors register nonexistent libraries hallucinated by AI code generators, then wait for an agent to install the fabricated package on the next run.
- Hallucinated MCP servers - agents resolve and call typosquatted Model Context Protocol endpoints, handing tool calls to attacker-controlled infrastructure.
References & Indicators
| Type | Value | Description |
|---|---|---|
| OWASP Category | LLM09:2025 | Misinformation, including Unsafe Code Generation. OWASP GenAI Security Project, 2025. |
| Web Vuln Class | A03:2021 Injection | 33 CWEs, 274,228 occurrences, 32,078 CVEs. OWASP Top 10:2021. |
| Attack Technique | Slopsquatting | Registration of hallucinated dependencies for supply-chain compromise. |
| Research Anchor | Perry, Boneh et al, CCS 2023 | Stanford user study: AI-assisted developers wrote less secure code while feeling more confident. |
| Industry Anchor | Veracode GenAI 2025 | 45% of AI-generated samples introduced OWASP Top 10 flaws; 2.74x vulnerability density vs human code. |
| Trend Anchor | CSA 2026 | CVEs attributed to AI-generated code: 6 (Jan) -> 35 (Mar); true count estimated 5-10x higher. |
Stay in the loop
Get the latest from Arrakis Security delivered to your inbox.




