2022
DOI: 10.3390/s22145381
|View full text |Cite
|
Sign up to set email alerts
|

Two-Step Joint Optimization with Auxiliary Loss Function for Noise-Robust Speech Recognition

Abstract: In this paper, a new two-step joint optimization approach based on the asynchronous subregion optimization method is proposed for training a pipeline model composed of two different models. The first-step processing of the proposed joint optimization approach trains the front-end model only, and the second-step processing trains all the parameters of the combined model together. In the asynchronous subregion optimization method, the first-step processing only supports the goal of the front-end model. However, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2024
2024
2025
2025

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 51 publications
0
6
0
Order By: Relevance
“…The ASR performance of each of the training approaches was evaluated by measuring the WER on both the validation and the test datasets. The WER of the ASR model trained using the proposed training approach was then compared to those of six different approaches, as follows: (1) an ASR model trained via the MCT using the clean and noisy training datasets (denoted as MCT-noisy); (2) an SE model trained on the clean and noisy speech training datasets, wherein the enhanced signal was subsequently fed into the MCT-noisy ASR model (denoted as MCT-noisy + standalone-SE); (3) an ASR model trained by the MCT using the clean, noisy, and enhanced data from the standalone-SE datasets (denoted as MCT-all); (4) a combination of the SE and ASR models trained by conventional joint optimization (denoted as Joint-Straight) [ 9 ]; (5) a pipeline trained by ASO-based joint optimization (denoted as Joint-ASO) [ 18 ]; (6) a pipeline trained by Grad-based joint optimization (denoted as Joint-Grad) [ 20 ]; and (7) a pipeline trained by Token-based joint optimization (denoted as Joint-Token) [ 22 ].…”
Section: Performance Evaluation and Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…The ASR performance of each of the training approaches was evaluated by measuring the WER on both the validation and the test datasets. The WER of the ASR model trained using the proposed training approach was then compared to those of six different approaches, as follows: (1) an ASR model trained via the MCT using the clean and noisy training datasets (denoted as MCT-noisy); (2) an SE model trained on the clean and noisy speech training datasets, wherein the enhanced signal was subsequently fed into the MCT-noisy ASR model (denoted as MCT-noisy + standalone-SE); (3) an ASR model trained by the MCT using the clean, noisy, and enhanced data from the standalone-SE datasets (denoted as MCT-all); (4) a combination of the SE and ASR models trained by conventional joint optimization (denoted as Joint-Straight) [ 9 ]; (5) a pipeline trained by ASO-based joint optimization (denoted as Joint-ASO) [ 18 ]; (6) a pipeline trained by Grad-based joint optimization (denoted as Joint-Grad) [ 20 ]; and (7) a pipeline trained by Token-based joint optimization (denoted as Joint-Token) [ 22 ].…”
Section: Performance Evaluation and Discussionmentioning
confidence: 99%
“…In this section, the performance of the proposed training approach was evaluated for noise-robust ASR, and it was then compared with the performance of MCT and conventional joint training approaches, including asynchronous subregion optimization (ASO)-based joint optimization [ 18 ], gradients-surgery (Grad)-based joint optimization [ 19 ], and acoustic tokenizer (Token)-based joint optimization [ 22 ]. Here, the ASO-based joint optimization approach was first used to train a pipeline with the SE and ASR encoder losses, and it was then further trained with the combination of the SE and ASR losses.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In this section, we evaluate the performance of the proposed KD-based training approach for noise-robust ASR and compare it with the MCT and conventional joint training approaches, including ASO-based joint optimization [25] and gradient-remedy-based joint optimization [24]. The ASR and SE performances were measured using two different datasets.…”
Section: Performance Evaluationmentioning
confidence: 99%
“…When optimizing the entire pipeline, challenges arise because of conflicting gradients, leading to a convergence issue, which is called a conflicting problem [23], [24]. To solve this conflicting problem, several training approaches have been studied, such as the asynchronous subregion optimization (ASO)-based approach [25], [26] and the gradient surgery-based approach [24], [27]. Although these approaches yield promising results, they suffer from frame mismatching between SE and ASR [28], which mainly stems from the different objectives of SE and ASR.…”
mentioning
confidence: 99%