2013
DOI: 10.1109/tkde.2012.60
|View full text |Cite
|
Sign up to set email alerts
|

Efficient and Effective Duplicate Detection in Hierarchical Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
29
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 28 publications
(29 citation statements)
references
References 19 publications
0
29
0
Order By: Relevance
“…Our experiments are performed on four different datasets 3 , two synthetic datasets [12] (Country and Company) with sampled spelling errors and two real datasets (Restaurant and Tungsten). The Country and Company datasets contain 9 and 11 fields/features respectively.…”
Section: Settingsmentioning
confidence: 99%
See 1 more Smart Citation
“…Our experiments are performed on four different datasets 3 , two synthetic datasets [12] (Country and Company) with sampled spelling errors and two real datasets (Restaurant and Tungsten). The Country and Company datasets contain 9 and 11 fields/features respectively.…”
Section: Settingsmentioning
confidence: 99%
“…Also we will extend the naive Bayes classifier (referred to as HR-NBC) by introducing hierarchy restrictions between features. As discussed in previous work [11,12], these hierarchy restrictions are very useful to avoid unnecessary computation of field comparison, and to help refine the Bayesian network structure.…”
Section: Introductionmentioning
confidence: 99%
“…Surveys [8,9]. review the various approaches, including named attributes computations [5], schema mapping [2,17] and duplicate detection in hierarchical data [10], all which inform the construction of profile linkage techniques.…”
Section: Record Linkage and Entity Resolutionmentioning
confidence: 99%
“…This field structure could be achieved by splitting unstructured/semi-structured addresses with address parsing. Moreover, there are hierarchical restrictions between these fields, which are useful to avoid unnecessary computation of field comparison [3,4]. These hierarchical restrictions can be mined from the semantic relationships between fields, which widely exist in real world record matching problems.…”
Section: Introductionmentioning
confidence: 99%