2003
DOI: 10.1109/jproc.2003.817117
|View full text |Cite
|
Sign up to set email alerts
|

Interacting with computers by voice: automatic speech recognition and synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
51
0
3

Year Published

2005
2005
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 96 publications
(55 citation statements)
references
References 244 publications
1
51
0
3
Order By: Relevance
“…The cepstral based features, MFCC and PLP, are expectedly better due to the better following of auditory scale. Similar results are reported for other languages as well [4]. According to the slightly better achievement of the MFCC over PLP features for acoustic modeling in Croatian LVASR the use of MFCC speech feature vectors is proposed.…”
Section: Speech Feature Vectorssupporting
confidence: 72%
See 1 more Smart Citation
“…The cepstral based features, MFCC and PLP, are expectedly better due to the better following of auditory scale. Similar results are reported for other languages as well [4]. According to the slightly better achievement of the MFCC over PLP features for acoustic modeling in Croatian LVASR the use of MFCC speech feature vectors is proposed.…”
Section: Speech Feature Vectorssupporting
confidence: 72%
“…The statistical approach uses hidden Markov models (HMM) as state of the art formalism for speech recognition. Many large vocabulary automatic speech recognition (LVASR) systems use mel-cepstral speech analysis, hidden Markov modeling of acoustic subword units, n-gram language models (LM) and n-best search of word hypothesis [1,3,4,5]. Automatic speech recognition research in languages like English, German and Japanese [6] puts its focus on recognition of spontaneous and broadcast speech.…”
Section: Introduction and Related Workmentioning
confidence: 99%
“…In HMM we mixture multi vibrate Gaussian distribution, probabilistic mean, variance and mixture weight for speech [19]. Each phoneme has different output distribution.…”
Section:  Hidden Markov Modelmentioning
confidence: 99%
“…It takes time P to process an input of duration I. It is defined by the formula [1] as given below RTF = P I…”
Section: Performance Measurement Of Speech Recognition Approachesmentioning
confidence: 99%