2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2017
DOI: 10.1109/asru.2017.8268938
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised adaptation with domain separation networks for robust speech recognition

Abstract: Unsupervised domain adaptation of speech signal aims at adapting a well-trained source-domain acoustic model to the unlabeled data from target domain. This can be achieved by adversarial training of deep neural network (DNN) acoustic models to learn an intermediate deep representation that is both senone-discriminative and domain-invariant. Specifically, the DNN is trained to jointly optimize the primary task of senone classification and the secondary task of domain classification with adversarial objective fu… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
43
0

Year Published

2018
2018
2025
2025

Publication Types

Select...
7
1
1

Relationship

4
5

Authors

Journals

citations
Cited by 53 publications
(44 citation statements)
references
References 27 publications
1
43
0
Order By: Relevance
“…ASR suffers from performance degradation when a well-trained acoustic model is applied in a new domain [19]. T/S learning [3,8,9] and adversarial learning [20,21,22,23,24] are two effective approaches that can suppress this domain mismatch by adapting a source-domain acoustic model to target-domain speech. T/S learning is more suited for the situation where unlabeled parallel data is available for adaptation, 2 in which a sequence of source-domain speech features is fed as the input to a source-domain teacher model and a parallel sequence of target-domain features is at the input to the target-domain student model to optimize the student model parameters by minimizing the T/S loss in Eq.…”
Section: Conditional T/s Learning For Domain Adaptationmentioning
confidence: 99%
“…ASR suffers from performance degradation when a well-trained acoustic model is applied in a new domain [19]. T/S learning [3,8,9] and adversarial learning [20,21,22,23,24] are two effective approaches that can suppress this domain mismatch by adapting a source-domain acoustic model to target-domain speech. T/S learning is more suited for the situation where unlabeled parallel data is available for adaptation, 2 in which a sequence of source-domain speech features is fed as the input to a source-domain teacher model and a parallel sequence of target-domain features is at the input to the target-domain student model to optimize the student model parameters by minimizing the T/S loss in Eq.…”
Section: Conditional T/s Learning For Domain Adaptationmentioning
confidence: 99%
“…To improve the ASR performance, we enhance the 9137 noisy utterances in CHiME-3 with ACSE and re-train the clean DNN-HMM acoustic model in [31]. We use the same senone-level forced alignments as the clean model for re-training.…”
Section: Acoustic Model Re-trainingmentioning
confidence: 99%
“…The authors in [15,16] adapt their acoustic model to test data by learning an adaptation function between the hidden unit contributions of the training data and the development data. However, as pointed by [17], these methods require reliable tri-phone alignment and this may not always be successful in a mismatched condition.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, there has been a lot of work [17,18,19] using adversarial training to adapt to the target domain data in a completely unsupervised way. The authors in [18] and [19] use a gradient reversal layer to train a domain classifier and try to learn domain invariant representations by passing a negative gradient on classification of domains.…”
Section: Related Workmentioning
confidence: 99%