Proceedings of the 17th ACM Conference on Information and Knowledge Management 2008
DOI: 10.1145/1458082.1458150
|View full text |Cite
|
Sign up to set email alerts
|

Learning to link with wikipedia

Abstract: This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles. The resulting link detector and disambiguator performs very well, with recall and precision of almost 75%. This performance is constant whether the system is evaluated on Wikipedia articles or "real world" documents.This work h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
898
0
6

Year Published

2009
2009
2017
2017

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 933 publications
(907 citation statements)
references
References 9 publications
3
898
0
6
Order By: Relevance
“…Each text has all the meaning bearing phrases annotated with at most one DBpedia resource. Aquaint50 dataset contains 50 documents from the AQUAINT corpus, that were used by Milne and Witten [19]. They have been linked and disambiguated to Wikipedia articles by their system, and the results were evaluated using Amazon Mechanical Turk 5 .…”
Section: Evaluated Methodsmentioning
confidence: 99%
“…Each text has all the meaning bearing phrases annotated with at most one DBpedia resource. Aquaint50 dataset contains 50 documents from the AQUAINT corpus, that were used by Milne and Witten [19]. They have been linked and disambiguated to Wikipedia articles by their system, and the results were evaluated using Amazon Mechanical Turk 5 .…”
Section: Evaluated Methodsmentioning
confidence: 99%
“…This paper mines associative relations between entities from Wikipedia 1 , using the method proposed by Milne and Witten(2008). Specifically, given two entities e 1 and e 2 , we compute the semantic distance dist(e 1 , e 2 ) between them as:…”
Section: Mining Entity Semantic Knowledgementioning
confidence: 99%
“…A famous example of this kind of similarity measures is the Normalized Google Distance [9], which uses Google as a corpus of documents. We use the Wikipedia [27] relatedness measure, as defined by Milne et al [20,21] because of its easiness of use. This distance adapts the Normalized Google Distance to use Wikipedia as a corpus of reference for computation.…”
Section: Combining Probabilistic Semantic Similarity Measures Within mentioning
confidence: 99%
“…with probability α n+α (21) where x * i is one of the k unique values among the observations gathered.…”
Section: Preliminaries: Dirichlet Processmentioning
confidence: 99%