2019
DOI: 10.48550/arxiv.1912.09536
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Data Science through the looking glass and what we found there

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(15 citation statements)
references
References 0 publications
0
11
0
Order By: Relevance
“…Model cards are still scarcely adopted in practice. In all of GitHub with millions of public notebooks [31,33,37] and many repositories sharing learning code and learned models, we found only 24 models documented explicitly with model cards. Our best effort on finding model cards published by companies results in only 28 models.…”
Section: Discussionmentioning
confidence: 99%
“…Model cards are still scarcely adopted in practice. In all of GitHub with millions of public notebooks [31,33,37] and many repositories sharing learning code and learned models, we found only 24 models documented explicitly with model cards. Our best effort on finding model cards published by companies results in only 28 models.…”
Section: Discussionmentioning
confidence: 99%
“…To do this, we analyzed over 480,000 pipelines with more than 920,000 operators, the result of searches carried out under a 5 minute execution time budget. For purposes of our analysis, we define an operator to be a single step in the pipeline, which can correspond to a data transformer or a predictor (a distinction presented in [49]). We calculate the amount of operators per pipeline for different sampling ratios, analyzing all the pipelines evaluated during the search procedure to account for changes during evolutions.…”
Section: Rq4: Pipeline Characteristicsmentioning
confidence: 99%
“…For example, when we use a downsampling ratio of 0.0001 the average pipeline has 1.85 (0.30 sd) operators, while a full dataset results in an average pipeline with 1.60 (0.12 sd) operators. For context, a recent large scale pipeline analysis by Psallidas et al [49] found that most user-implemented scikit-learn (TPOT's target API) pipelines consist of 1 -4 operators.…”
Section: Rq4: Pipeline Characteristicsmentioning
confidence: 99%
See 1 more Smart Citation
“…Studies through empirical code analysis and qualitative studies offer different lenses into studying human-centered practices in developing ML workflows. Psallidas et al [18] analyzed publicly-available computational notebooks and enterprise data science code and pipelines to illustrate growing trends and usage behavior of data science tools. Other studies have employed qualitative, semi-structured interviews to study how different groups of users engage with ML development, including how software engineers [2] and non-experts [25] develop ML-based applications, and how ML practitioners iterate on their data in ML development [11].…”
Section: Related Workmentioning
confidence: 99%