2007
DOI: 10.1109/tkde.2007.250581
|View full text |Cite
|
Sign up to set email alerts
|

Duplicate Record Detection: A Survey

Abstract: Often, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and/or they contain errors that make duplicate matching a dif cult task. Errors are introduced as the result of transcription errors, incomplete information, lack of standard formats or any combination of these factors. In this article, we present a thorough analysis of the literature on duplicate record detection. We cover similarity metrics that are commonly used to detect similar eld… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
1,053
0
24

Year Published

2009
2009
2022
2022

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 1,353 publications
(1,079 citation statements)
references
References 73 publications
2
1,053
0
24
Order By: Relevance
“…Surveys [8,9]. review the various approaches, including named attributes computations [5], schema mapping [2,17] and duplicate detection in hierarchical data [10], all which inform the construction of profile linkage techniques.…”
Section: Record Linkage and Entity Resolutionmentioning
confidence: 99%
“…Surveys [8,9]. review the various approaches, including named attributes computations [5], schema mapping [2,17] and duplicate detection in hierarchical data [10], all which inform the construction of profile linkage techniques.…”
Section: Record Linkage and Entity Resolutionmentioning
confidence: 99%
“…Entity Linkage is the process that decides whether two descriptions refer to the same real world entity (see [12] for an overview). Actually, state-of-the-art methods from this area have also been reused and adapted in implementing entity search.…”
Section: Entity Searchmentioning
confidence: 99%
“…PowerMap uses the Watson 5 semantic search engine as a gateway to the SW. In addition, PowerMap can also query its own repositories and offers the capability to index and add new online ontologies 6 . In the third step, the Triple Similarity Service (TSS) matches the QTs to ontological expressions.…”
Section: Motivating Scenario: Question Answering On the Semantic Webmentioning
confidence: 99%
“…Basic similarity metrics based on string comparison were developed in the database community (e.g., [16,3]). These metrics are used as a basis for the majority of algorithms, which compare values of attributes of different data instances and aggregate them to make a decision about two instances referring to the same entity (see [6] for a survey). The main distinction of our work is that, in the PowerAqua scenario, the fusion of answers is done in real time.…”
Section: Related Workmentioning
confidence: 99%