2013
DOI: 10.1016/j.artint.2012.03.006
|View full text |Cite
|
Sign up to set email alerts
|

Learning multilingual named entity recognition from Wikipedia

Abstract: We present a corpus of sentence-aligned triples of German audio, German text, and English translation, based on German audio books. The corpus consists of over 100 hours of audio material and over 50k parallel sentences. The audio data is read speech and thus low in disfluencies. The quality of audio and sentence alignments has been checked by a manual evaluation, showing that speech alignment quality is in general very high. The sentence alignment quality is comparable to well-used parallel translation data a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
197
0
12

Year Published

2013
2013
2019
2019

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 277 publications
(211 citation statements)
references
References 37 publications
2
197
0
12
Order By: Relevance
“…They are commonly used to train statistical machine learners but are limited in scope due to the cost of manual annotation. This is a problem because others have shown that more training data leads to higher accuracy language models [1], [2].…”
Section: Introductionmentioning
confidence: 99%
“…They are commonly used to train statistical machine learners but are limited in scope due to the cost of manual annotation. This is a problem because others have shown that more training data leads to higher accuracy language models [1], [2].…”
Section: Introductionmentioning
confidence: 99%
“…With the development of multilingual Wikipedia, researchers have been employing it in many multilingual applications [3,16,17,20,23,24]. Similar to the English-only contexts, each dimension in a multilingual context representation vector represented the relatedness of the target entity with a set of entities/words in the corresponding language.…”
Section: Related Workmentioning
confidence: 99%
“…Richman and Schone utilized the multilingual characteristics of Wikipedia to annotate a large corpus of text with NER tags [14]. Similarly, Nothman et al [15] automatically created multilingual training annotations for NER by exploiting the text and structure of parallel Wikipedia articles in different languages.…”
Section: Named Entity Extractionmentioning
confidence: 99%