2021
DOI: 10.1111/rssa.12762
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing (Publications on) Data Quality: Deeper Data Minding and Fuller Data Confession

Abstract: Statistics typically treats data as inputs for analysis, whereas the broader data science enterprise deals with the entire data life cycle, including the phases that output data. This commentary argues that it would benefit statistics and (data) science if we statisticians were also to treat data as products in and of themselves, and accordingly subject them to data minding, a stringent quality inspection process that scrutinizes data conceptualization, data pre-processing, data curation and data provenance, i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 29 publications
0
5
0
Order By: Relevance
“…Indeed, Lin reminded us that the quality control should start earlier, that is, at the data collection stage, "addressing the problem before analysis even began." Although this was not a topic for my panel presentation in 2017 and hence it was not included in the article, it has been a central topic of my research since [22], which led to the proposal of "data minding" [26]. The main finding, as reported in [23], supports Lin's emphasis to its core-data quality matters far more than data quantity.…”
Section: It Is All About "Quality At Every Step"mentioning
confidence: 86%
See 2 more Smart Citations
“…Indeed, Lin reminded us that the quality control should start earlier, that is, at the data collection stage, "addressing the problem before analysis even began." Although this was not a topic for my panel presentation in 2017 and hence it was not included in the article, it has been a central topic of my research since [22], which led to the proposal of "data minding" [26]. The main finding, as reported in [23], supports Lin's emphasis to its core-data quality matters far more than data quantity.…”
Section: It Is All About "Quality At Every Step"mentioning
confidence: 86%
“…Recently, I coined the term "data confession" [26] to encourage more disclosures in research publications about defects in data conceptualization, collection or pre-processing, as another component in enhancing the replicability and ultimately the reliability of published scientific studies, since data quality matters far more than data quantity [23,4,27,3]. The retrospective introspection summarized above suggests a more general data science confession (DSC) in our publications, where we can benefit from each others' mistakes and lessons learned, especially how we reason with ourselves, where we can engage in a pure intellectual dialogue without being distracted by suspicions of impure motivations.…”
Section: Taking a Lead In Data (Science) Confessionmentioning
confidence: 99%
See 1 more Smart Citation
“…Their influence on estimation outcomes has been demonstrated, quantitatively, in the fields of fair machine learning [17], natural language processing [11], and psychology [46], to name a few. Yet, in general it is less common to encounter meaningful detail about the preprocessing stage in discussions about research outputs, than it is to learn about how the data were collected and modelled [32]. Preprocessing decisions often remain tucked away in code-either inaccessible or difficult to parse, limiting our ability to interpret and replicate results.…”
Section: Introductionmentioning
confidence: 99%
“…Communicating and documenting data preprocessing is one aspect of data provenance, a broader concept referring to all aspects of dataset production. An influx of interest in data provenance in the machine learning community has led to work exploring how we might better record and utilise information about a dataset's creation [12,19,26,27,32,38,43]. Preprocessing is mentioned in the provenance literature, but because there are many aspects of provenance, it receives limited attention.…”
Section: Introductionmentioning
confidence: 99%