Being a Part of A Premier Data Science Research Hub
This was a great experience to be a part of the D4 Institute at Iowa State University, starting from writing research proposal, to publish novel results, and so on.
D4 Institute is an interdisciplinary data science hub at Iowa State university where professors, graduate students, REU students, and researchers from Computer Science, Electrical Engineering, Mathematics, Statistics collaborate to ensure the dependability of data science.
D4 institute took about four year to assemble and then funded by NSF TRIPOD grant in 2020. My advisor Hridesh Rajan leads the project as an PI. I have been involved with D4 from the beginning of writing the NSF proposal. Afterwards, I continue to contribute as a graduate researcher in the project.
The computer science magazine Atanasoff Today featured out work in the D4 Institute. The magazine is available in here: https://www.cs.iastate.edu/atanasoff-today-piecing-together-premier-data-science-research-hub
Sumon Biswas (’21 computer science, Ph.D.) was immediately drawn to Iowa State’s computer science program, in part because of Rajan’s group. Research opportunities related to the data science field matched nicely with his career goals. Biswas was particularly drawn to the group’s commitment to researching the data science pipeline.
The central goal of the project is to ensure the dependability of data-driven software. With the growing interests in AI and machine learning, we need to focus on the safety, security, fairness, robustness, and more critical properties of such systems.
“My research interests are very specific and tailored. My career focus blends software engineering, programming languages and data science,” Biswas said. “The varied research opportunities at Iowa State, in particular with the D4 Institute, allowed me to become an entrepreneur of sorts and design my own career that fit with my goals.”
My advisor guided through the process to delve into the area and make original contribution in the project. We are continuing to work in the area and blend the software engineering and programming language expertise to bring more reliability on the AI and ML based systems.
Rajan has provided Biswas with a rich array of opportunities that have shaped his career path. In addition to engaging in cutting-edge research on the data science life cycle, Biswas provided significant contributions to the development of the successful TRIPODS NSF grant. He also attended the Midwest Big Data Summer School where he learned cutting-edge research methods that further drew him into studying the data science life cycle.
Specifically, I looked deep into the data science pipeline, which is a ordered set of stages including data collection, exploratory analysis, data preprocessing, modeling, training, evaluation, and different properties of the pipeline.
“It’s been incredible,” Biswas said. “I’ve learned novel research ideas from D4 researchers and practitioners who have introduced me to studying the data science pipeline and its properties.”
I already published my research work on ensuring fairness of machine learning models. The work analyzes different fairness measures, mitigation techniques, and their impacts in real-world ML based software.
Biswas is close to publishing his own research which he conducted at the D4 Institute. “It’s exciting to be involved in research that could improve software systems, which affect many people who are impacted by data-driven decisions,” he said.
Rajan and his team plan to hire additional undergraduates, graduate students and postdocs at the D4 Institute. More students, like Biswas, will benefit from the experience of conducting NSF-funded research and working with seasoned experts who collaborate on studies.
We have a full-grown team of collaborators now, undergraduate and graduate students, postdocs, industry partners, and professors from different expertise. I have also mentored undergraduate students and collaborated with others, which was a great experience to gain further knowledge, and share thoughts and ideas.
- Boa Meets Python: A Boa Dataset of Data Science Software in Python Language
- Our Research Identifies Unfairness in the Component Level of AI Based Software
- 23 Shades of Self-Admitted Technical Debt: An Empirical Study on Machine Learning Software
- Fair Preprocessing: Towards Understanding Compositional Fairness of Data Transformers in Machine Learning Pipeline
- Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias? An Empirical Study on Model Fairness