As scholarly data increases rapidly, scholarly digital libraries, supplying publication data through convenient online interfaces, become popular and important tools for researchers. Researchers use SDLs for various purposes, including searching the publications of an author, assessing one's impact by the citations, and identifying one's research topics. However, common names among authors cause difficulties in correctly identifying one's works among a large number of scholarly publications. Abbreviated first and middle names make it even harder to identify and distinguish authors with the same representation (i.e. spelling) of names. Several disambiguation methods have solved the problem under their own assumptions. The assumptions are usually that inputs such as the number of same-named authors, training sets, or rich and clear information about papers are given. Considering the size of scholarship records today and their inconsistent formats, we expect their assumptions be very hard to be met. We use common assumption that coauthors are likely to write more than one paper together and propose an unsupervised approach to group papers from the same author only using the most common information, author lists. We represent each paper as a point in an author name space, take dimension reduction to find author names shown frequently together in papers, and cluster papers with vector similarity measure well fitted for name disambiguation task. The main advantage of our approach is to use only coauthor information as input. We evaluate our method using publication records collected from DBLP, and show that our approach results in better disambiguation compared to other five clustering methods in terms of cluster purity and fragmentation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.