Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries 2018
DOI: 10.1145/3197026.3197036
|View full text |Cite
|
Sign up to set email alerts
|

Effective Unsupervised Author Disambiguation with Relative Frequencies

Abstract: This work addresses the problem of author name homonymy in the Web of Science. Aiming for an efficient, simple and straightforward solution, we introduce a novel probabilistic similarity measure for author name disambiguation based on feature overlap. Using the researcher-ID available for a subset of the Web of Science, we evaluate the application of this measure in the context of agglomeratively clustering author mentions. We focus on a concise evaluation that shows clearly for which problem setups and at whi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 11 publications
(21 citation statements)
references
References 14 publications
0
21
0
Order By: Relevance
“…Considering that previous studies have shown that author name and coauthorship play the most important roles in distinguishing authors [5,18,19,47], disambiguation models learned on AMINER may have incorrectly assigned author name instances with similar author name strings and coauthor names to belong to different people, producing a greater number of false negatives and leading to the lower observed recall scores for transfer learning from AMINER descried in our discussion of Fig. 4~7 above.…”
Section: Feature Similarity Score Distributionmentioning
confidence: 88%
“…Considering that previous studies have shown that author name and coauthorship play the most important roles in distinguishing authors [5,18,19,47], disambiguation models learned on AMINER may have incorrectly assigned author name instances with similar author name strings and coauthor names to belong to different people, producing a greater number of false negatives and leading to the lower observed recall scores for transfer learning from AMINER descried in our discussion of Fig. 4~7 above.…”
Section: Feature Similarity Score Distributionmentioning
confidence: 88%
“…This allows including attributes flexibly without the necessity to specify the corresponding weights separately. The results reported in Backes (2018a) suggest that using equal weights for all attributes produces good results. Each iteration of the clustering process merges all pairs of current clusters whose similarity exceeds l. The quality limit l is designed to have a linear dependence on the block size |X|, whereby the parameter specifies this relationship (see Eq.…”
Section: Implementation Of the Four Selected Disambiguation Approachesmentioning
confidence: 94%
“…This threshold can be determined globally for all pairs of author mentions being compared, or it can vary depending on the number of author mentions within a block that refer to a single name representation. Block-size-dependent thresholds try to reduce the problem of an increasing number of false links for a higher number of comparisons between author mentions; that is, for larger name blocks (Backes, 2018a;Caron & van Eck, 2014).…”
Section: Publication Titlementioning
confidence: 99%
See 2 more Smart Citations