Using a knowledge base to disambiguate personal name in web search results

Vu, Quang Minh; Masada, Tomonari; Takasu, Atsuhiro; Adachi, Jun

doi:10.1145/1244002.1244188

Cited by 14 publications

(10 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first method is based on a hierarchical clustering strategy and the second one makes use of social networks. Vu et al (2007) propose the use of Web directories as a knowledge base to disambiguate personal names in Web search results, whereas Bekkerman and McCallum (2005) present two methods for addressing this same problem, one based on the link structure of the Web pages and the other one using agglomerative/conglomerative double clustering, a multi‐way distributional clustering. Galvez and de Moya‐Anegón (2007) address the problem of conflating personal name variants in a canonical form using binary matrices and finite‐state graphs.…”

Section: Related Workmentioning

confidence: 99%

An unsupervised heuristic‐based hierarchical method for name disambiguation in bibliographic citations

Cota

Ferreira

Nascimento

et al. 2010

J. Am. Soc. Inf. Sci.

125

View full text Add to dashboard Cite

Name ambiguity in the context of bibliographic citations is a difficult problem which, despite the many efforts from the research community, still has a lot of room for improvement. In this article, we present a heuristic-based hierarchical clustering method to deal with this problem. The method successively fuses clusters of citations of similar author names based on several heuristics and similarity measures on the components of the citations (e.g., coauthor names, work title, and publication venue title). During the disambiguation task, the information about fused clusters is aggregated providing more information for the next round of fusion. In order to demonstrate the effectiveness of our method, we ran a series of experiments in two different collections extracted from real-world digital libraries and compared it, under two metrics, with four representative methods described in the literature. We present comparisons of results using each considered attribute separately (i.e., coauthor names, work title, and publication venue title) with the author name attribute and using all attributes together. These results show that our unsupervised method, when using all attributes, performs competitively against all other methods, under both metrics, loosing only in one case against a supervised method, whose result was very close to ours. Moreover, such results are achieved without the burden of any training and without using any privileged information such as knowing a priori the correct number of clusters.

show abstract

Section: Related Workmentioning

confidence: 99%

An unsupervised heuristic‐based hierarchical method for name disambiguation in bibliographic citations

Cota

Ferreira

Nascimento

et al. 2010

J. Am. Soc. Inf. Sci.

125

View full text Add to dashboard Cite

show abstract

“…A great deal of research has focused on the name disambiguation problem in different types of data, such as geographic name disambiguation [6], biomedical term disambiguation [7], and personal name disambiguation [8]. Several papers [1,9,10,11] have also focused on using the content in citations to solve the name disambiguation problem.…”

Section: Related Workmentioning

confidence: 99%

Author Name Disambiguation for Citations Using Topic and Web Correlation

Yang

Peng

Jiang

et al. 2008

Research and Advanced Technology for Digital Libraries

View full text Add to dashboard Cite

Abstract. Today, bibliographic digital libraries play an important role in helping members of academic community search for novel research. In particular, author disambiguation for citations is a major problem during the data integration and cleaning process, since author names are usually very ambiguous. For solving this problem, we proposed two kinds of correlations between citations, namely, Topic Correlation and Web Correlation, to exploit relationships between citations, in order to identify whether two citations with the same author name refer to the same individual. The topic correlation measures the similarity between research topics of two citations; while the Web correlation measures the number of co-occurrence in web pages. We employ a pair-wise grouping algorithm to group citations into clusters. The results of experiments show that the disambiguation accuracy has great improvement when using topic correlation and Web correlation, and Web correlation provides stronger evidences about the authors of citations.

show abstract

“…In [9], we proposed a modification method for tf to improve measurement of terms' weight. Here, we present a brief summarization of that method.…”

Section: Modification Of Tfmentioning

confidence: 99%

“…Therefore, we try to evaluate all clusters whose sizes are larger than three. The details on calculation of evaluation metrics can be found in [9]. The evaluation results of four methods VSM, NER, SKB1 and SKB2 and comparison among them are shown in Table 3.…”

Section: Evaluation Of Large Clustersmentioning

confidence: 99%