Design and Architecture of Data Science Pipelines
We study, design, and analyze the DS pipeline architecture consisting stages such as preprocessing, modeling, training, evaluation, etc.
ML Repo Dataset from GitHub
This dataset is created by mining 5M Python program snapshots. The code is transformed to AST for static analysis.
Large-Scale Mining of Data-Science Software from GitHub
Mining and analyzing data-science repositories can provide insights from historical data.