2014
DOI: 10.1002/asi.22992
|View full text |Cite
|
Sign up to set email alerts
|

Self‐training author name disambiguation for information scarce scenarios

Abstract: We present a novel 3-step self-training method for author name disambiguation-SAND (self-training associative name disambiguator)-which requires no manual labeling, no parameterization (in real-world scenarios) and is particularly suitable for the common situation in which only the most basic information about a citation record is available (i.e., author names, and work and venue titles). During the first step, real-world heuristics on coauthors are able to produce highly pure (although fragmented) clusters. T… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
37
0
1

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 32 publications
(38 citation statements)
references
References 37 publications
0
37
0
1
Order By: Relevance
“…Step 1 aims at producing pure clusters for the next steps using the author and coauthor attributes, as in [1,5]. We use coauthorship relations among citations in order to group together those belonging to a same author.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…Step 1 aims at producing pure clusters for the next steps using the author and coauthor attributes, as in [1,5]. We use coauthorship relations among citations in order to group together those belonging to a same author.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…Several existing works (Cota et al, ; Fan et al, ; Ferreira et al, ; Santana et al, ) report that overlapping coauthors between a pair of records with author name α is a strong indicator of author similarity. Thus, we initially generate an author‐similarity graph (a‐s graph) Ga, formed by connecting node (article) pairs with overlapping coauthors.…”
Section: Overview Of the Proposed Methodsmentioning
confidence: 99%
“…A diverse range of techniques have been applied to the AND problem such as supervised approaches (support vector machines and naive Bayes: Han, Giles, Zha, Li, and Tsioutsiouliklis []), unsupervised approaches (Ferreira, Veloso, Gonçalves, & Laender, ; Khabsa, Treeratpituk, & Giles, ), graph‐based models (Markov random field: Tang, Fong, Wang, and Zhang []; factor graph model: Wang, Tang, Cheng, and Philip []), heuristic‐based solutions (Cota, Ferreira, Nascimento, Gonçalves, & Laender, ; Santana, Gonçalves, Laender, & Ferreira, ). Ferreira et al (2014), Liu, Li, Huang, and Fang (), and Cota et al (2010) have proposed grouping/clustering the records using coauthors, title, and venue.…”
Section: Introductionmentioning
confidence: 99%
“…According to our taxonomy, the methods may be classified following the main type of exploited approach: author grouping [3,9,11,12,14,13,16,17], which tries to group the references to the same author using some type of similarity among reference attributes, or author assignment [1,4,8,10,15,18], which aims at directly assigning the references to their respective authors. Alternatively, the methods may be grouped according to the evidence explored in the disambiguation task: the citation attributes (only), web information, or implicit data that can be extracted from the available information.…”
Section: Proposed Taxonomymentioning
confidence: 99%
“…HHC disambiguates a set of citation records by successively fusing clusters of citation records with similar author names based on a real-world heuristic applied to their citation attributes. Then, we present SAND -Self-training Associative Name Disambiguator [9,8]. SAND is a three-step selftraining method for author name disambiguation that requires no manual labeling and no parameterization (in real world scenarios).…”
Section: Introductionmentioning
confidence: 99%