2012
DOI: 10.1145/2107536.2107538
|View full text |Cite
|
Sign up to set email alerts
|

Improving data quality by source analysis

Abstract: In many domains, data cleaning is hampered by our limited ability to specify a comprehensive set of integrity constraints to assist in identification of erroneous data. An alternative approach to improve data quality is to exploit different data sources that contain information about the same set of objects. Such overlapping sources highlight hot-spots of poor data quality through conflicting data values and immediately provide alternative values for conflict resolution. In order to derive a dataset of high qu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 51 publications
0
6
0
Order By: Relevance
“…Some of these differences are rather extreme, for example, the gene gain or loss tree shows that the squirrel only has β-casein, while the rest of the caseins are absent. It is of paramount importance to note that comparative genomics data is only as good as the quality of the genome databases (Muller et al ., 2003). Sequencing errors that are carried over to the actual genome database may be misleading and therefore result in inaccurate interpretation of results.…”
Section: Discussionmentioning
confidence: 99%
“…Some of these differences are rather extreme, for example, the gene gain or loss tree shows that the squirrel only has β-casein, while the rest of the caseins are absent. It is of paramount importance to note that comparative genomics data is only as good as the quality of the genome databases (Muller et al ., 2003). Sequencing errors that are carried over to the actual genome database may be misleading and therefore result in inaccurate interpretation of results.…”
Section: Discussionmentioning
confidence: 99%
“…As a result, companies and entities increasingly rely on data to support decision making and gain a competitive advantage. To make informed and effective decisions, it is crucial to evaluate and guarantee the quality of the cybersecurity data [Muller, Freytag, and Leser 2012] Even with the issues listed throughout this study, we were able to establish several criteria and analyses to mitigate possible impacts on the results. Nevertheless, it is necessary to study the best ways to gather, analyze, generate reports, and more trustworthy research regarding data breach incidents.…”
Section: Need Better Access To Datamentioning
confidence: 99%
“…Conflicts are resolved by defining a conflict resolution function, which should be declaratively specified [27]. As discussed by Müller et al [26], "conflicts between contradicting sources are often systematic, caused by some characteristic of the different sources". The goal for experts is to identify these characteristics and use their domain knowledge to resolve conflicts.…”
Section: Data Quality and Conflict Resolutionmentioning
confidence: 99%
“…To resolve conflicts efficiently, experts require effective tool support that allows them to explore the data and understand its characteristics, then formulate resolution strategies and apply them to the data set. In doing so, the ability to "exploit different data sources that contain information about the same set of objects" is key [26].…”
Section: Data Quality and Conflict Resolutionmentioning
confidence: 99%