2001
DOI: 10.1109/89.917681
|View full text |Cite
|
Sign up to set email alerts
|

Syllable-based large vocabulary continuous speech recognition

Abstract: Most large vocabulary continuous speech recognition (LVCSR) systems in the past decade have used a context-dependent (CD) phone as the fundamental acoustic unit. In this paper, we present one of the first robust LVCSR systems that uses a syllable-level acoustic unit for LVCSR on telephone-bandwidth speech. This effort is motivated by the inherent limitations in phone-based approaches-namely the lack of an easy and efficient way for modeling long-term temporal dependencies. A syllable unit spans a longer time f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
61
0
1

Year Published

2002
2002
2019
2019

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 104 publications
(64 citation statements)
references
References 17 publications
(17 reference statements)
2
61
0
1
Order By: Relevance
“…Throughout the intervening years, however, a variety of alternative subword models have been studied in parallel, with the basic units including syllables [16], [17], acoustically defined units [18], [19], graphemes [20], and sub-phonetic features [21], [22], [23], [24], [25], [26]. Figure 1 serves as an informal summary of some of the main sub-word modeling approaches described in this article.…”
Section: Historical Reviewmentioning
confidence: 99%
“…Throughout the intervening years, however, a variety of alternative subword models have been studied in parallel, with the basic units including syllables [16], [17], acoustically defined units [18], [19], graphemes [20], and sub-phonetic features [21], [22], [23], [24], [25], [26]. Figure 1 serves as an informal summary of some of the main sub-word modeling approaches described in this article.…”
Section: Historical Reviewmentioning
confidence: 99%
“…To alleviate the problems of the 'beads on a string' representation of speech, several authors propose modelling the spectral and temporal variation in speech 'implicitly' by using longerlength linguistic units as the basic building blocks of speech (Ganapathiraju et al, 2001;Hämäläinen et al, 2007a;Jones et al, 1997;Jouvet and Messina, 2004;Plannerer and Ruske, 1992;. For various reasons, most of these authors (Ganapathiraju et al, 2001;Hämäläinen et al, 2007a;Jones et al, 1997;Jouvet and Messina, 2004; suggest using syllable-length models.…”
Section: Introductionmentioning
confidence: 99%
“…Coarticulation effects, for instance, often stretch beyond the left and right neighbouring phones. The corresponding long-span spectral and temporal dependencies are not easy to capture with models that have as limited a window size as triphones (Ganapathiraju et al, 2001). Moreover, the pronunciation variants in the lexicon do not cover all variation in actual speech production (McAllaster and Gillick, 1999;.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations