2017
DOI: 10.1089/big.2016.0048
|View full text |Cite
|
Sign up to set email alerts
|

Conscientious Classification: A Data Scientist's Guide to Discrimination-Aware Classification

Abstract: Recent research has helped to cultivate growing awareness that machine-learning systems fueled by big data can create or exacerbate troubling disparities in society. Much of this research comes from outside of the practicing data science community, leaving its members with little concrete guidance to proactively address these concerns. This article introduces issues of discrimination to the data science community on its own terms. In it, we tour the familiar data-mining process while providing a taxonomy of co… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
72
0
2

Year Published

2018
2018
2020
2020

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 154 publications
(74 citation statements)
references
References 18 publications
0
72
0
2
Order By: Relevance
“…Sample size, inclusion, and selection of participants will have an overall effect on the performance of the algorithms due to the common issue of underestimation that was faced in previous studies that have utilized ML. Although the sample is not representative of the population in Qatar, it would have been adequate for generalization of results and proper assessment of the performance of the algorithms if our study had a larger sample size [ 60 , 61 ].…”
Section: Discussionmentioning
confidence: 99%
“…Sample size, inclusion, and selection of participants will have an overall effect on the performance of the algorithms due to the common issue of underestimation that was faced in previous studies that have utilized ML. Although the sample is not representative of the population in Qatar, it would have been adequate for generalization of results and proper assessment of the performance of the algorithms if our study had a larger sample size [ 60 , 61 ].…”
Section: Discussionmentioning
confidence: 99%
“…Underestimation occurs when a learning algorithm is trained on insufficient data and fails to provide estimates for interesting or important cases, instead approximating mean trends toavoid overfitting. 14 Low sample size and underestimation of minority groups are not unique to machine learning or electronic health record data but a common issue in other types of studies, such as randomized clinical trials and genetic studies. For instance, genetic studies have been criticized for not fully accounting for genetic diversity in non-European populations.…”
Section: Sample Size and Underestimationmentioning
confidence: 99%
“…Muller et al summarized and extended this analysis, concluding that humans may intervene between data and analysis in five ways, according to each data science worker's professional discernment: discovery of data, capture of data, design of data, curation of data, and even creation of data [48]. In these ways, data science workers engage deeply with data before data science modeling activities, during model selection [62,68] and after model refinement, such as when testing for bias [5,72].…”
Section: Data Science Teams and Disciplinary Diversitymentioning
confidence: 99%