Sumon Biswas
Sumon Biswas
Home
Publication
Service
Projects
Teaching
Students
News
Talks
Blogs
Light
Dark
Automatic
Mining Software Repositories
What Breaks When LLMs Code? Characterizing Operational Safety Failures of Agentic Code Assistants
An empirical study of 547 real-world operational safety failures in LLM-based coding agents, revealing a taxonomy of 33 risk types and showing that over 65% of incidents arise during routine bug fixing and configuration tasks.
Alif Al Hasan
,
Sumon Biswas
Cite
ArXiv
Are Prompt Engineering and TODO Comments Friends or Foes? An Evaluation on GitHub Copilot
We show that GitHub Copilot can generate code with the symptoms of SATD, both prompted and unprompted. Moreover, we demonstrate the tool’s ability to automatically repay SATD under different circumstances and qualitatively investigate the characteristics of successful and unsuccessful comments.
David OBrien
,
Sumon Biswas
,
Sayem Imtiaz
,
Rabe Abdalkareem
,
Emad Shihab
,
Hridesh Rajan
Cite
DOI
The Art and Practice of Data Science Pipelines: A Comprehensive Study of Data Science Pipelines In Theory, In-The-Small, and In-The-Large
This work attempts to inform the terminology and practice for designing data science (DS) pipeline. Our investigation suggest that DS pipeline is a well used software architecture but often built in ad hoc manner. We demonstrated the importance of standardization and analysis framework for DS pipeline following the traditional software engineering research on software architecture and design patterns. We also contributed three representations of DS pipelines that capture the essence of our subjects in theory, in-the-small, and in-the-large that would facilitate building new DS systems.
Sumon Biswas
,
Mohammad Wardat
,
Hridesh Rajan
Cite
DOI
Talk
23 Shades of Self-Admitted Technical Debt: An Empirical Study on Machine Learning Software
We provided a comprehensive taxonomy of machine learning SATDs. Our study analyzes ML SATD type organizations, their frequencies within stages of ML software, the differences between ML SATDs in applications and tools, and the effort of ML SATD removals. The findings discovered suggest implications for ML developers and researchers to create maintainable ML systems.
David OBrien
,
Sumon Biswas
,
Sayem Imtiaz
,
Rabe Abdalkareem
,
Emad Shihab
,
Hridesh Rajan
Cite
DOI
Boa Meets Python: A Boa Dataset of Data Science Software in Python Language
The popularity of Python programming language has surged in recent years due to its increasing usage in Data Science. The availability of Python repositories in Github presents an opportunity for mining software repository research, e.g., suggesting the best practices in developing Data Science applications, identifying bug-patterns, recommending code enhancements, etc. To enable this research, we have created a new dataset that includes 1,558 mature Github projects that develop Python software for Data Science tasks.
Sumon Biswas
,
Md Johirul Islam
,
Yijia Huang
,
Hridesh Rajan
Cite
Dataset
DOI
Slides
Cite
×