Code Generation

What Breaks When LLMs Code? Characterizing Operational Safety Failures of Agentic Code Assistants

An empirical study of 547 real-world operational safety failures in LLM-based coding agents, revealing a taxonomy of 33 risk types and showing that over 65% of incidents arise during routine bug fixing and configuration tasks.

Alif Al Hasan, Sumon Biswas

What Breaks When LLMs Code? Characterizing Operational Safety Failures of Agentic Code Assistants

Are Prompt Engineering and TODO Comments Friends or Foes? An Evaluation on GitHub Copilot

We show that GitHub Copilot can generate code with the symptoms of SATD, both prompted and unprompted. Moreover, we demonstrate the tool’s ability to automatically repay SATD under different circumstances and qualitatively investigate the characteristics of successful and unsuccessful comments.

David OBrien, Sumon Biswas, Sayem Imtiaz, Rabe Abdalkareem, Emad Shihab, Hridesh Rajan

Are Prompt Engineering and TODO Comments Friends or Foes? An Evaluation on GitHub Copilot