2022
DOI: 10.1007/978-3-030-95070-5_8
|View full text |Cite
|
Sign up to set email alerts
|

Multi-style Training for South African Call Centre Audio

Abstract: Mismatched data is a challenging problem for automatic speech recognition (ASR) systems. One of the most common techniques used to address mismatched data is multi-style training (MTR), a form of data augmentation that attempts to transform the training data to be more representative of the testing data; and to learn robust representations applicable to different conditions. This task can be very challenging if the test conditions are unknown. We explore the impact of different MTR styles on system performance… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…It is extremely difficult to accurately create a noisy dataset that is well matched to a new set of conditions. The performance of a model using a dataset that was manually created is far worse than one with perfectly matched conditions [24]. When a small dataset with matched conditions is available, current GAN techniques cannot be used to reduce the mismatch, because a corresponding clean version does not exist (they require both the clean and noisy version to minimise the L1 or mean squared error loss).…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…It is extremely difficult to accurately create a noisy dataset that is well matched to a new set of conditions. The performance of a model using a dataset that was manually created is far worse than one with perfectly matched conditions [24]. When a small dataset with matched conditions is available, current GAN techniques cannot be used to reduce the mismatch, because a corresponding clean version does not exist (they require both the clean and noisy version to minimise the L1 or mean squared error loss).…”
Section: Discussionmentioning
confidence: 99%
“…The speed and volume of the training data are changed with 10% and 20%, respectively [24]. The speed is either increased or decreased with equal probability -approximately half of the utterances have a slower speed compared to the original set and the other half have a faster speed.…”
Section: Multi-style Trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…Generative model GAN has been applied to enable automatic speech recognition (ASR), improving the features of mismatched data prior to decoding [ 30 ]. In another related work, the ASR system has been researched by combining multi-style training (MTR) with deep neural network hidden Markov model (DNN-HMM) [ 31 ]. The use of CNN in exploring classification accuracy on SNR data has been reported by Andrew et al [ 32 ].…”
Section: Introductionmentioning
confidence: 99%