2016
DOI: 10.48550/arxiv.1611.04878
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Data Quality Metric (DQM): How to Estimate The Number of Undetected Errors in Data Sets

Abstract: Data cleaning, whether manual or algorithmic, is rarely perfect leaving a dataset with an unknown number of false positives and false negatives after cleaning. In many scenarios, quantifying the number of remaining errors is challenging because our data integrity rules themselves may be incomplete, or the available gold-standard datasets may be too small to extrapolate. As the use of inherently fallible crowds becomes more prevalent in data cleaning problems, it is important to have estimators to quantify the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 20 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?