2012
DOI: 10.1016/j.specom.2012.01.002
|View full text |Cite
|
Sign up to set email alerts
|

Impact of vocal effort variability on automatic speech recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
18
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 49 publications
(18 citation statements)
references
References 19 publications
0
18
0
Order By: Relevance
“…Typically, two main strategies are used to handle the mismatch problem, namely, (1) multiple model recognizer, where dedicated speaker models are obtained for different vocal efforts (e.g., [14]) and (2) multi-style models, where each model is obtained from a combination of normal speech and small amounts of speech of varying vocal efforts [15,14]. Notwithstanding, the two different methods were shown to have their advantages and disadvantages.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Typically, two main strategies are used to handle the mismatch problem, namely, (1) multiple model recognizer, where dedicated speaker models are obtained for different vocal efforts (e.g., [14]) and (2) multi-style models, where each model is obtained from a combination of normal speech and small amounts of speech of varying vocal efforts [15,14]. Notwithstanding, the two different methods were shown to have their advantages and disadvantages.…”
Section: Introductionmentioning
confidence: 99%
“…Notwithstanding, the two different methods were shown to have their advantages and disadvantages. For example, while both improve the performance of whispered speech [8,14], multiple model training requires significant amounts of whispered speech data to obtain the speaker models, which can be hard to obtain in practice. Multi-style based systems, in turn, despite requiring lower amounts of whispered speech to train the models, trade gains in whispered speech to losses in normal speech accuracy, often by the same amount [14].…”
Section: Introductionmentioning
confidence: 99%
“…3,4 Also, detection of high vocal effort can be applied in speech and speaker recognition in order to tackle a possible mismatch between training and testing conditions. 5,6 For all these technological applications, the performance of human listeners in shout detection serves as a natural point of comparison.…”
Section: Introductionmentioning
confidence: 99%
“…However there are still different research problems that have received little attention, and require more effort to make advances towards the understanding of speech communication. That is the case when there are changes in the vocal effort, which have proven to affect significantly the performance of automatic speech recognition and speaker recognition systems [1,2,3]. Particularly, whispered speech exhibits significant differences with normal phonated speech, being the main physical difference the complete lack of vocal folds vibration.…”
Section: Introductionmentioning
confidence: 99%
“…This strategy can improve significantly the performance of recognition systems, thus allowing for normal and whispered speech to be handled. Nevertheless, different authors suggest that for optimal applications, it is better to have dedicated models for each vocal effort and select the most likely model according to the detected vocal effort [5,3].…”
Section: Introductionmentioning
confidence: 99%