2010 IEEE International Conference on Acoustics, Speech and Signal Processing 2010
DOI: 10.1109/icassp.2010.5494962
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised knowledge acquisition for Extracting Named Entities from speech

Abstract: International audienceThis paper presents a Named Entity Recognition (NER) method dedicated to process speech transcriptions. The main principle behind this method is to collect in an unsupervised way lexical knowledge for all entries in the ASR lexicon. This knowledge is gathered with two methods: by automatically extracting NEs on a very large set of textual corpora and by exploiting directly the structure contained in the Wikipedia resource. This lexical knowledge is used to update the statistical models of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2011
2011
2015
2015

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(12 citation statements)
references
References 5 publications
0
12
0
Order By: Relevance
“…The proposed systems are evaluated using either manual or automatic segmentations and transcripts. In the two cases, the named entities are automatically detected using [1]. The progress done on all elementary tasks (speaker diarization, speech recognition and the use of belief functions) decreases the speaker identification error rate from 75.15% [8] to 60.59%.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The proposed systems are evaluated using either manual or automatic segmentations and transcripts. In the two cases, the named entities are automatically detected using [1]. The progress done on all elementary tasks (speaker diarization, speech recognition and the use of belief functions) decreases the speaker identification error rate from 75.15% [8] to 60.59%.…”
Section: Resultsmentioning
confidence: 99%
“…Our acoustic speaker identification system is based on the wellknown UBM/GMM approach developed in the ALIZE toolkit 1 . First, a pre-processing concerning training, development and test sets is necessary.…”
Section: Speaker Identification Based On Gmmmentioning
confidence: 99%
“…Toponym identification in such a corpus is a difficult task. Most often, processes for NE identification involve either a symbolic approach based on local grammars (Bontcheva, Dimitrov, Maynard, Tablan, & Cunningham, 2002;Friburger & Maurel, 2004;Poibeau, 2003) or a statistical approach based on automatic learning, or hybrid systems such as Béchet and Charton (2010), and Leidner (2007). Our approach is guided by corpus specifications: located information, spelling variations and a very large number of toponyms.…”
Section: Custom-made Map Requests and Title Corpusmentioning
confidence: 99%
“…Neighboring words and POS tags: They are acknowledged to be efficient for NER [73,74]. In fact, we use these features not only for training our word embeddings, but also for training the baseline model of NER (Section 3.2.1).…”
Section: Feature Extraction For Word Embeddingsmentioning
confidence: 99%
“…Nevertheless the adaptation of NER methods to conversational speech remains challenging due to, for example, case insensitivity, lack of punctuations, un-grammatical structure, repetition, and presence of disfluencies inherent to conversations. In addition, there is not much spoken data annotated with named entities to cover the huge variety of named entity instances likely occurring in speech, and simply increasing the amount of manual annotation is not realistic for reasons of cost, evolution of new spoken terms and diversity.Several works on NER from spoken contents have already explored the use of external resources like online gazetteers[73] and Wikipedia[74] to overcome the lack of annotations. Gazetteers, for instance, have successfully boosted NER performance for given entities (e.g., Location), but do not convey the information related to the context words surrounding the entity names that are also important for NER.…”
mentioning
confidence: 99%