2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
DOI: 10.1109/waspaa.2019.8937250
|View full text |Cite
|
Sign up to set email alerts
|

Speech Enhancement Using End-to-End Speech Recognition Objectives

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
16
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
3
1

Relationship

4
6

Authors

Journals

citations
Cited by 50 publications
(17 citation statements)
references
References 20 publications
1
16
0
Order By: Relevance
“…And in this paper, we mainly focus on the multi-speaker case, which is a more difficult task. It is worth noting that this end-to-end architecture is optimized only based on the final ASR criterion, which was also proven feasible in previous works [16,[24][25][26]. Our experiments show that our newly proposed method outperformed the conventional end-to-end ASR systems [24,25,27] in both single-speaker and multi-speaker reverberant conditions.…”
Section: Introductionsupporting
confidence: 52%
“…And in this paper, we mainly focus on the multi-speaker case, which is a more difficult task. It is worth noting that this end-to-end architecture is optimized only based on the final ASR criterion, which was also proven feasible in previous works [16,[24][25][26]. Our experiments show that our newly proposed method outperformed the conventional end-to-end ASR systems [24,25,27] in both single-speaker and multi-speaker reverberant conditions.…”
Section: Introductionsupporting
confidence: 52%
“…ASR Multi-task Training: There are prior studies that employ joint or multi-task training for the front-end and backend [12,13,14,15,16,17,18,19,20,21,22]. However, they concerned only about the ASR accuracy and did not pay close attention to the SE quality nor analyzed the trade-off between the two tasks.…”
Section: Related Workmentioning
confidence: 99%
“…With the popularity of speech-related intelligent devices and related applications, front-end processing has become a popular research topic [1]. Among them, a batch of methods based on end-to-end deep learning have emerged to solve the cocktail party problem [2]- [7]. Compared to conventional approaches like computational auditory scene analysis [8], [9] and non-negative matrix factorization [10], end-to-end models are entirely data-driven, achieving remarkable improvement in speech quality and intelligibility.…”
Section: Introductionmentioning
confidence: 99%