Sumon Biswas
Sumon Biswas
Home
Publication
Service
Projects
Teaching
Students
News
Talks
Blogs
Light
Dark
Automatic
ML Pipeline
The Art and Practice of Data Science Pipelines: A Comprehensive Study of Data Science Pipelines In Theory, In-The-Small, and In-The-Large
This work attempts to inform the terminology and practice for designing data science (DS) pipeline. Our investigation suggest that DS pipeline is a well used software architecture but often built in ad hoc manner. We demonstrated the importance of standardization and analysis framework for DS pipeline following the traditional software engineering research on software architecture and design patterns. We also contributed three representations of DS pipelines that capture the essence of our subjects in theory, in-the-small, and in-the-large that would facilitate building new DS systems.
Sumon Biswas
,
Mohammad Wardat
,
Hridesh Rajan
Cite
DOI
Talk
Fair Preprocessing: Towards Understanding Compositional Fairness of Data Transformers in Machine Learning Pipeline
We introduced the causal method of fairness to reason about the fairness impact of data preprocessing stages in ML pipeline. We leveraged existing metrics to define the fairness measures of the stages. Then we conducted a detailed fairness evaluation of the preprocessing stages in 37 pipelines collected from three different sources.
Sumon Biswas
,
Hridesh Rajan
Cite
DOI
Cite
×