2012
DOI: 10.14778/2168651.2168656
|View full text |Cite
|
Sign up to set email alerts
|

A Bayesian approach to discovering truth from conflicting sources for data integration

Abstract: In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major challenge for data integration is to derive the most complete and accurate integrated records from diverse and sometimes conflicting sources. We term this challenge the truth finding problem. We observe that some sources are generally more reliable than others, and therefore a good model of source quality is the key to solving the truth findin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
242
0
1

Year Published

2013
2013
2019
2019

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 299 publications
(243 citation statements)
references
References 14 publications
0
242
0
1
Order By: Relevance
“…It models the propagation of information trustworthiness from the known ground truths. Zhao et al adopt probabilistic graphical models in truth discovery tasks [36,37]. The existence of multiple truths for single entity is considered in [37] where source reliability is modeled as two-sided: sensitivity and specificity.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…It models the propagation of information trustworthiness from the known ground truths. Zhao et al adopt probabilistic graphical models in truth discovery tasks [36,37]. The existence of multiple truths for single entity is considered in [37] where source reliability is modeled as two-sided: sensitivity and specificity.…”
Section: Related Workmentioning
confidence: 99%
“…Zhao et al adopt probabilistic graphical models in truth discovery tasks [36,37]. The existence of multiple truths for single entity is considered in [37] where source reliability is modeled as two-sided: sensitivity and specificity. Later, a model specially designed for numerical data is proposed in [36].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Hiding dependencies among data items and denying write access to some data items are the two methods to prevent the attacks. Papers like [10] and [11] propose a probabilistic graphical model that can automatically infer true records and source quality in cloud data without any supervision. They leverage a generative process of two types of errors (false positive and false negative) by modeling two different aspects of source quality.…”
Section: Related Workmentioning
confidence: 99%
“…The participating sources collaborate to claim their own observations, such as facts and labels, on these objects. Our goal is to aggregate these collective observations to infer the true values (e.g., the true fact and image label) for the different objects [18,14,5].…”
Section: Introductionmentioning
confidence: 99%