What Breaks When LLMs Code? Characterizing Operational Safety Failures of Agentic Code Assistants

Abstract

Autonomous coding agents built on large language models (LLMs) are rapidly being integrated into development workflows, yet their operational safety properties remain poorly understood beyond evaluations of explicitly malicious inputs. In this paper, we conduct an empirical study examining real-world operational safety failures by triangulating two evidence sources: screening 68,816 papers from 22 premier venues and mining 16,586 GitHub issues from LLM-powered coding tools, confirming 547 genuine safety failures. We develop a taxonomy of 33 operational risk types across seven dimensions. Our analysis reveals that over 65% of incidents arise during bug fixing and setup or configuration tasks, with dominant failure types including constraint violations, destructive operations, and authorization bypasses; 326 of 547 incidents were rated high or critical severity. These findings indicate that current safety evaluations inadequately address real-world deployment risks beyond adversarial attacks, and that effective safeguards must extend to environmental constraint enforcement, failure transparency mechanisms, and safe-halt capabilities.

Related