ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414116
|View full text |Cite
|
Sign up to set email alerts
|

Vowel Non-Vowel Based Spectral Warping and Time Scale Modification for Improvement in Children’s ASR

Abstract: Acoustic differences between children's and adults' speech causes the degradation in the automatic speech recognition system performance when system trained on adults' speech and tested on children's speech. The key acoustic mismatch factors are formant, speaking rate, and pitch. In this paper, we proposed a linear prediction based spectral warping method by using the knowledge of vowel and non-vowel regions in speech signals to mitigate the formant frequencies differences between child and adult speakers. The… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 23 publications
0
2
0
Order By: Relevance
“…Both of these methods use a uniform warping factor for all the formants. In related work, LPC-based spectral warping was applied selectively on vowel and non-vowel locations along with time scale modification in [23]. More recently, a novel fundamental frequency-based frequency warping technique was proposed which was shown to improve the performance of children ASR when added with other data augmentation techniques [24].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Both of these methods use a uniform warping factor for all the formants. In related work, LPC-based spectral warping was applied selectively on vowel and non-vowel locations along with time scale modification in [23]. More recently, a novel fundamental frequency-based frequency warping technique was proposed which was shown to improve the performance of children ASR when added with other data augmentation techniques [24].…”
Section: Related Workmentioning
confidence: 99%
“…2 (a) shows the mean scaling factor (%) for the first 3 formants, averaged across all the vowels of children speakers, with respect to adult male formants. Also, existing spectrum warping methods [20,21,23] do not consider spectral variabilities due to an underdeveloped vocal tract, such as varying formant's bandwidth and formant's energy with respect to the adult spectrum. Generally, these methods are used to normalize the children's speech or acoustic features to minimize the mismatch between train and test datasets but not as an augmentation.…”
Section: Related Workmentioning
confidence: 99%