Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-2409
|View full text |Cite
|
Sign up to set email alerts
|

Cycle-Consistent Speech Enhancement

Abstract: Feature mapping using deep neural networks is an effective approach for single-channel speech enhancement. Noisy features are transformed to the enhanced ones through a mapping network and the mean square errors between the enhanced and clean features are minimized. In this paper, we propose a cycle-consistent speech enhancement (CSE) in which an additional inverse mapping network is introduced to reconstruct the noisy features from the enhanced ones. A cycle-consistent constraint is enforced to minimize the r… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 46 publications
(28 citation statements)
references
References 40 publications
0
28
0
Order By: Relevance
“…ASR suffers from performance degradation when a well-trained acoustic model is applied in a new domain [19]. T/S learning [3,8,9] and adversarial learning [20,21,22,23,24] are two effective approaches that can suppress this domain mismatch by adapting a source-domain acoustic model to target-domain speech. T/S learning is more suited for the situation where unlabeled parallel data is available for adaptation, 2 in which a sequence of source-domain speech features is fed as the input to a source-domain teacher model and a parallel sequence of target-domain features is at the input to the target-domain student model to optimize the student model parameters by minimizing the T/S loss in Eq.…”
Section: Conditional T/s Learning For Domain Adaptationmentioning
confidence: 99%
“…ASR suffers from performance degradation when a well-trained acoustic model is applied in a new domain [19]. T/S learning [3,8,9] and adversarial learning [20,21,22,23,24] are two effective approaches that can suppress this domain mismatch by adapting a source-domain acoustic model to target-domain speech. T/S learning is more suited for the situation where unlabeled parallel data is available for adaptation, 2 in which a sequence of source-domain speech features is fed as the input to a source-domain teacher model and a parallel sequence of target-domain features is at the input to the target-domain student model to optimize the student model parameters by minimizing the T/S loss in Eq.…”
Section: Conditional T/s Learning For Domain Adaptationmentioning
confidence: 99%
“…Identity-mapping loss: We regularize Generators G and F to be close to identity mappings by minimizing identity-mapping loss as in [21]. This loss preserves the compositions ((i.e., linguistic information) of the input source features and the target ones [22], [23], and helps the generators better map the target distribution.…”
Section: Cycle-consistent Relativistic Gan For Sementioning
confidence: 99%
“…Instead of explicitly minimizing the L1\L2 losses, which can cause over-smoothing results, the discriminator provides a high-level abstract measurement of the "realness" of the generated images. This idea has been used in SE tasks with parallel [18][19][20][21][22][23] or nonparallel corpora [24,25].…”
Section: Related Workmentioning
confidence: 99%