2006
DOI: 10.1007/11871637_53
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Name Disambiguation for Large-Scale Databases

Abstract: Abstract. Name disambiguation can occur when one is seeking a list of publications of an author who has used different name variations and when there are multiple other authors with the same name. We present an efficient integrative framework for solving the name disambiguation problem: a blocking method retrieves candidate classes of authors with similar names and a clustering method, DBSCAN, clusters papers by author. The distance metric between papers used in DBSCAN is calculated by an online active selecti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
115
0
12

Year Published

2009
2009
2017
2017

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 114 publications
(128 citation statements)
references
References 10 publications
1
115
0
12
Order By: Relevance
“…• cF1 [7]: Combines the fraction of clusters from R that are also in S and the fraction of clusters from S in R.…”
Section: Existing Measuresmentioning
confidence: 99%
See 1 more Smart Citation
“…• cF1 [7]: Combines the fraction of clusters from R that are also in S and the fraction of clusters from S in R.…”
Section: Existing Measuresmentioning
confidence: 99%
“…The cluster F1 measure [7,2] counts clusters that exactly match and is defined as the harmonic mean of the cluster precision and cluster recall. The cluster precision is defined as |R∩S| |R| while the cluster recall is defined as |R∩S| |S|…”
Section: A2 Cluster-level Comparisonmentioning
confidence: 99%
“…They use a mix of techniques. While some use similarity functions [2,7,12,18,21,27,30], others use learning techniques [1,14,16,28,32,35], heuristics [17,19,20,24], classifiers [9,10,34] and clustering methods [11,31].…”
Section: Background and Related Workmentioning
confidence: 99%
“…For example, even if author A and author B are classified as one person, and author B and author C are also classified as one person, author A and author C may be classified as two different persons. By applying density-based spacial clustering of application with noise (DBSCAN), a clustering algorithm based on the density reachability of data points, CiteSeerX resolves most of these inconsistent cases (Huang, Ertekin, and Giles 2006). The remaining small portion of ambiguous cases are those located at cluster boundaries.…”
Section: Author Disambiguationmentioning
confidence: 99%