2014
DOI: 10.1140/epjds/s13688-014-0011-3
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting citation networks for large-scale author name disambiguation

Abstract: We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon the author and citation graphs available for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared references and citations, is established to perform a two-step agglomerative clustering that first connects individu… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
62
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 56 publications
(62 citation statements)
references
References 27 publications
0
62
0
Order By: Relevance
“…Conversely, a given individual researcher could appear as several researchers because of misspellings. We acknowledge that we did not use algorithmic author name disambiguation [13]. The first type of error would lead to underestimating the number of potential reviewers and the second to overestimating the number of potential reviewers.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Conversely, a given individual researcher could appear as several researchers because of misspellings. We acknowledge that we did not use algorithmic author name disambiguation [13]. The first type of error would lead to underestimating the number of potential reviewers and the second to overestimating the number of potential reviewers.…”
Section: Discussionmentioning
confidence: 99%
“…We did not use any methods for author name disambiguation for researchers indexed under the same “LastName”, “ForeName” and “Initials”. [13, 14] We set N s to be equal to the number of publications for which we identified at least one author.…”
Section: Methodsmentioning
confidence: 99%
“…Specifically, some scholars assume that if a name in paper X appears in paper Y that cites X, those two name instances appearing both in paper X and Y refer to the same author (Levin et al, 2012;Schulz, Mazloumian, Petersen, Penner, & Helbing, 2014;. Email address has also been used for the automatic construction of labeled data: two names associated with the same email address are assumed to represent the same author (e.g., Schulz et al, 2014). As shown in the example studies in Table 1, scholars have evaluated AND for digital libraries using various types of labeled data and evaluation metrics.…”
Section: Related Workmentioning
confidence: 99%
“…The performance of DBLP's disambiguation was tested on automatically labeled data (SelfCite hereafter) consisting of pairs of name instances in self-citation relation. These labeled data can be used for measuring recall defined as the ratio of the number of pairs matched by DBLP's disambiguation over the number of pairs in the labeled data (Liu et al, 2014;Schulz et al, 2014;. As this recall metric considers only name pairs in self-citation relation, it is different from the pairwise-F recall which considers all possible pairs among name instances referring to the same author.…”
Section: Automatically Labeled Datamentioning
confidence: 99%
“…Name homonymy causes a merge citation problem, whereas name synonymy leads to a split citation (Lee, On, Kang, & Park, ) problem. Such ambiguities deteriorate the performance of document retrieval, web search, and various bibliometric analysis like coauthorship link prediction (Wang & Sukthankar, ), collaboration network analysis (Viana, Amancio, & Costa, ), and citation network analysis (Amancio, Oliveira Jr, & da Fontoura Costa, ; Schulz, Mazloumian, Petersen, Penner, & Helbing, ).…”
Section: Introductionmentioning
confidence: 99%