Proceedings of the First Workshop on Scholarly Document Processing 2020
DOI: 10.18653/v1/2020.sdp-1.8
|View full text |Cite
|
Sign up to set email alerts
|

Learning CNF Blocking for Large-scale Author Name Disambiguation

Abstract: Author name disambiguation (AND) algorithms identify a unique author entity record from all similar or same publication records in scholarly or similar databases. Typically, a clustering method is used that requires calculation of similarities between each possible record pair. However, the total number of pairs grows quadratically with the size of the author database making such clustering difficult for millions of records. One remedy is a blocking function that reduces the number of pairwise similarity calcu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 22 publications
0
8
0
Order By: Relevance
“…Most of the studies treat the problem of author name ambiguity as an unsupervised task [18,36,17,17,26] using algorithms like DBSCAN [17] and agglomerative clustering [31]. Liu et al [21] and Kim et al [18] rely on the similarity between a pair of records with the same name to disambiguate author names on the PubMed dataset. Zhang et al [36] used Recurrent Neural Network to estimate the number of unique authors in the Aminer dataset.…”
Section: Unsupervised-basedmentioning
confidence: 99%
See 1 more Smart Citation
“…Most of the studies treat the problem of author name ambiguity as an unsupervised task [18,36,17,17,26] using algorithms like DBSCAN [17] and agglomerative clustering [31]. Liu et al [21] and Kim et al [18] rely on the similarity between a pair of records with the same name to disambiguate author names on the PubMed dataset. Zhang et al [36] used Recurrent Neural Network to estimate the number of unique authors in the Aminer dataset.…”
Section: Unsupervised-basedmentioning
confidence: 99%
“…These approaches rely on the matching between publications and authors which are verified either manually or automatically. Unsupervised approaches [21,18,5] have also been used to assess the similarity between a pair of papers. Other unsupervised approaches are also used to estimate the number of co-authors sharing the same name [36] and decide whether new records can be assigned to an existing author or a new one [26].…”
Section: Introductionmentioning
confidence: 99%
“…Most of the studies treat the problem of author name ambiguity as an unsupervised task [6,9,9,13,15] using algorithms like DBSCAN [9] and agglomerative clustering [21]. Liu et al [12] and Kim et al [13] rely on the similarity between a pair of records with the same name to disambiguate author names on the PubMed dataset. Zhang et al [15] used Recurrent Neural Network (RNN) to estimate the number of unique authors in the Aminer dataset.…”
Section: Unsupervised-basedmentioning
confidence: 99%
“…In this work, we collected our dataset from the DBLP bibliographic repository 13 . The DBLP version of July 2020 contains 5.4 million bibliographic records such as conference papers, articles, thesis, etc., from various fields of research.…”
Section: Datasetmentioning
confidence: 99%
See 1 more Smart Citation