IEEE International Conference on Acoustics Speech and Signal Processing 2002
DOI: 10.1109/icassp.2002.5743871
|View full text |Cite
|
Sign up to set email alerts
|

Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

4
30
0

Year Published

2009
2009
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(34 citation statements)
references
References 12 publications
4
30
0
Order By: Relevance
“…The first system, or "Grapheme" in the figure, with 26 letter units, starts 3% absolute lower than the "Phoneme Baseline w/ Autogen Prons" system for the smallest training set, but outperforms it as the amount of training data increases (76.9% vs 76.5% for the largest training set). This is consistent with Kanthak's and Killer's observations [7,8] (Kanthak's English training set contained less than 100 hours of speech). It is also consistent with our intuition that training data can somewhat compensate for the acoustic diversity of English letters by implicitly modeling the various sounds corresponding to each letter symbol.…”
Section: Resultssupporting
confidence: 89%
See 1 more Smart Citation
“…The first system, or "Grapheme" in the figure, with 26 letter units, starts 3% absolute lower than the "Phoneme Baseline w/ Autogen Prons" system for the smallest training set, but outperforms it as the amount of training data increases (76.9% vs 76.5% for the largest training set). This is consistent with Kanthak's and Killer's observations [7,8] (Kanthak's English training set contained less than 100 hours of speech). It is also consistent with our intuition that training data can somewhat compensate for the acoustic diversity of English letters by implicitly modeling the various sounds corresponding to each letter symbol.…”
Section: Resultssupporting
confidence: 89%
“…Kanthak et al [7] and Killer et al [8] observed experimentally that for some languages, grapheme systems performed roughly as well as phoneme systems, but that for others, such as English, there was a high error-rate cost to moving to graphemes. This was attributed by the authors to the poor spelling to pronunciation correspondance of the English language, which is another way of observing that, in English, letter units lack acoustic consistency, and that consistency matters, much like Cravero et al had suggested.…”
Section: Introductionmentioning
confidence: 99%
“…This is often unavailable or may be inconsistent if derived from multiple sources. Alternatively a grapheme-based speech recognition system [1,2] could be built. The recogniser then only needs an orthographic lexicon to specify the vocabulary rather than a pronunciation lexicon.…”
Section: Introductionmentioning
confidence: 99%
“…There is a wide variety of solutions that addresses these problems in different ways, ranging from the detection of large vocabularies [17], through the detection of spoken numbers for telephone applications [18], to the detection of segments, spoken or not spoken [19]. When working in complex environments with limited amount of data, multilingual contexts, nonlinearities, or uncontrollable noise, some possibilities are based on: enriching poor resources of a language with resources from another powerful language beside it, approaches oriented to the lack of resources, cross-lingual approaches [20], training of acoustic models for a new language using results from other languages [21], data optimization methods, collaborative systems, or open configuration systems. However, the development of a robust ASR system is very tough when there are under-resourced languages involved, even if there are powerful languages beside them, and the classic techniques perform poorly with regard to correct rates [22,23].…”
Section: Description Of the Environment And Requirements Of The Systemmentioning
confidence: 99%