Self‐training author name disambiguation for information scarce scenarios

Ferreira, Anderson A.; Veloso, Adriano; Gonçalves, Marcos André; Laender, Alberto H. F.

doi:10.1002/asi.22992

Cited by 32 publications

(38 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Step 1 aims at producing pure clusters for the next steps using the author and coauthor attributes, as in [1,5]. We use coauthorship relations among citations in order to group together those belonging to a same author.…”

Section: Proposed Methodsmentioning

confidence: 99%

Combining Classifiers and User Feedback for Disambiguating Author Names

Souza

Ferreira

Gonçalves

2015

Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries

Self Cite

View full text Add to dashboard Cite

Historically, supervised methods have been the most effective ones for author name disambiguation tasks. In here, we propose a specific manner to combine supervised techniques along with user feedback. Although, we use supervised techniques, the only user effort is to provide feedback on results since initial training data is automatically generated. Our experiments show gains up to 20% in the disambiguation performance against representative baselines.

show abstract

Section: Proposed Methodsmentioning

confidence: 99%

Combining Classifiers and User Feedback for Disambiguating Author Names

Souza

Ferreira

Gonçalves

2015

Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries

Self Cite

View full text Add to dashboard Cite

show abstract

“…Several existing works (Cota et al, ; Fan et al, ; Ferreira et al, ; Santana et al, ) report that overlapping coauthors between a pair of records with author name α is a strong indicator of author similarity. Thus, we initially generate an author‐similarity graph (a‐s graph)

G^{(a)}

, formed by connecting node (article) pairs with overlapping coauthors.…”

Section: Overview Of the Proposed Methodsmentioning

confidence: 99%

“…A diverse range of techniques have been applied to the AND problem such as supervised approaches (support vector machines and naive Bayes: Han, Giles, Zha, Li, and Tsioutsiouliklis []), unsupervised approaches (Ferreira, Veloso, Gonçalves, & Laender, ; Khabsa, Treeratpituk, & Giles, ), graph‐based models (Markov random field: Tang, Fong, Wang, and Zhang []; factor graph model: Wang, Tang, Cheng, and Philip []), heuristic‐based solutions (Cota, Ferreira, Nascimento, Gonçalves, & Laender, ; Santana, Gonçalves, Laender, & Ferreira, ). Ferreira et al (2014), Liu, Li, Huang, and Fang (), and Cota et al (2010) have proposed grouping/clustering the records using coauthors, title, and venue.…”

Section: Introductionmentioning

confidence: 99%

A Graph Combination With Edge Pruning‐Based Approach for Author Name Disambiguation

Pooja

Mondal

Chandra

2019

Asso for Info Science & Tech

View full text Add to dashboard Cite

Author name disambiguation (AND) is a challenging problem due to several issues such as missing key identifiers, same name corresponding to multiple authors, along with inconsistent representation. Several techniques have been proposed but maintaining consistent accuracy levels over all data sets is still a major challenge. We identify two major issues associated with the AND problem. First, the namesake problem in which two or more authors with the same name publishes in a similar domain. Second, the diverse topic problem in which one author publishes in diverse topical domains with a different set of coauthors. In this work, we initially propose a method named ATGEP for AND that addresses the namesake issue. We evaluate the performance of ATGEP using various ambiguous name references collected from the Arnetminer Citation (AC) and Web of Science (WoS) data set. We empirically show that the two aforementioned problems are crucial to address the AND problem that are difficult to handle using state‐of‐the‐art techniques. To handle the diverse topic issue, we extend ATGEP to a new variant named ATGEP‐web that considers external web information of the authors. Experiments show that with enough information available from external web sources ATGEP‐web can significantly improve the results further compared with ATGEP.

show abstract

“…According to our taxonomy, the methods may be classified following the main type of exploited approach: author grouping [3,9,11,12,14,13,16,17], which tries to group the references to the same author using some type of similarity among reference attributes, or author assignment [1,4,8,10,15,18], which aims at directly assigning the references to their respective authors. Alternatively, the methods may be grouped according to the evidence explored in the disambiguation task: the citation attributes (only), web information, or implicit data that can be extracted from the available information.…”

Section: Proposed Taxonomymentioning

confidence: 99%

“…HHC disambiguates a set of citation records by successively fusing clusters of citation records with similar author names based on a real-world heuristic applied to their citation attributes. Then, we present SAND -Self-training Associative Name Disambiguator [9,8]. SAND is a three-step selftraining method for author name disambiguation that requires no manual labeling and no parameterization (in real world scenarios).…”

Section: Introductionmentioning

confidence: 99%

Automatic Methods for Disambiguating Author Names in Bibliographic Data Repositories

Ferreira

Gonçalves

Laender

2015

Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries

Self Cite

View full text Add to dashboard Cite

Name ambiguity in the context of bibliographic citation records is a hard problem that affects the quality of services and content in digital libraries and similar systems. This problem occurs when an author publishes works under distinct names or distinct authors publish works under similar names. The challenges of dealing with author name ambiguity have led to a myriad of name disambiguation methods. In this tutorial, we characterize such methods by means of a proposed taxonomy, present an overview of some of the most representative ones and discuss open challenges.

show abstract

Self‐training author name disambiguation for information scarce scenarios

Cited by 32 publications

References 37 publications

Combining Classifiers and User Feedback for Disambiguating Author Names

Combining Classifiers and User Feedback for Disambiguating Author Names

A Graph Combination With Edge Pruning‐Based Approach for Author Name Disambiguation

Automatic Methods for Disambiguating Author Names in Bibliographic Data Repositories

Contact Info

Product

Resources

About