Cross-site combination and evaluation of subword spoken term detection systems

Mertens, Timo; Wallace, Roy; Schneider, Daniel

doi:10.1109/cbmi.2011.5972521

Cited by 2 publications

(2 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This, in turn, consists of two parts: first, a pronunciation is hypothesized using phonemic subword units, and second, said pronunciation is converted to a spelling. Only generating a pronunciation for an unknown word is sufficient in applications such as Spoken Term Detection (STD) [6], where phonemic representations of speech are adequate for indexing and search. For transcription, however, an orthography needs to be estimated from a given phonemic subword sequence, as in [7], where phone transcriptions are converted to spellings using memory-based learning for Dutch OOV words.…”

Section: Introductionmentioning

confidence: 99%

Subword-based automatic lexicon learning for Speech Recognition

Mertens

Seneff

2011

2011 IEEE Workshop on Automatic Speech Recognition &Amp; Understanding

Self Cite

View full text Add to dashboard Cite

We present a framework for learning a pronunciation lexicon for an Automatic Speech Recognition (ASR) system from multiple utterances of the same training words, where the lexical identities of the words are unknown. Instead of only trying to learn pronunciations for known words we go one step further and try to learn both spelling and pronunciation in a joint optimization. Decoding based on linguistically motivated hybrid subword units generates the joint lexical search space, which is reduced to the most appropriate lexical entries based on a set of simple pruning techniques. A cascade of letter and acoustic pruning, followed by re-scoring N -best hypotheses with discriminative decoder statistics resulted in optimal lexical entries in terms of both spelling and pronunciation. Evaluating the framework on English isolated word recognition, we achieve reductions of 7.7% absolute on word error rate and 20.9% absolute on character error rate over baselines that use no pruning.

show abstract

Section: Introductionmentioning

confidence: 99%

Subword-based automatic lexicon learning for Speech Recognition

Mertens

Seneff

2011

2011 IEEE Workshop on Automatic Speech Recognition &Amp; Understanding

Self Cite

View full text Add to dashboard Cite

show abstract

“…Despite the bad performance exhibited by the configuration 4a corresponding to system 4, it must be noted that this was not optimized for the final metric (i.e., ATWV) but to get a predefined hit coverage, which greatly affects the final ATWV performance [62] and hence, a fair comparison with the rest of the systems cannot be made.…”

Section: Performance Analysis Of the Qbe Std Systems For Specific Quementioning

confidence: 99%

Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results, and discussion

Tejedor

Toledano

Anguera

et al. 2013

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Query-by-Example Spoken Term Detection (QbE STD) aims at retrieving data from a speech data repository given an acoustic query containing the term of interest as input. Nowadays, it has been receiving much interest due to the high volume of information stored in audio or audiovisual format. QbE STD differs from automatic speech recognition (ASR) and keyword spotting (KWS)/spoken term detection (STD) since ASR is interested in all the terms/words that appear in the speech signal and KWS/STD relies on a textual transcription of the search term to retrieve the speech data. This paper presents the systems submitted to the ALBAYZIN 2012 QbE STD evaluation held as a part of ALBAYZIN 2012 evaluation campaign within the context of the IberSPEECH 2012 Conference a . The evaluation consists of retrieving the speech files that contain the input queries, indicating their start and end timestamps within the appropriate speech file. Evaluation is conducted on a Spanish spontaneous speech database containing a set of talks from MAVIR workshops b , which amount at about 7 h of speech in total. We present the database metric systems submitted along with all results and some discussion. Four different research groups took part in the evaluation. Evaluation results show the difficulty of this task and the limited performance indicates there is still a lot of room for improvement. The best result is achieved by a dynamic time warping-based search over Gaussian posteriorgrams/posterior phoneme probabilities. This paper also compares the systems aiming at establishing the best technique dealing with that difficult task and looking for defining promising directions for this relatively novel task.

show abstract

Cross-site combination and evaluation of subword spoken term detection systems

Cited by 2 publications

References 15 publications

Subword-based automatic lexicon learning for Speech Recognition

Subword-based automatic lexicon learning for Speech Recognition

Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results, and discussion

Contact Info

Product

Resources

About