2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2011
DOI: 10.1109/icassp.2011.5947375
|View full text |Cite
|
Sign up to set email alerts
|

Speaker and noise factorisation on the AURORA4 task

Abstract: For many realistic scenarios, there are multiple factors that affect the clean speech signal. In this work approaches to handling two such factors, speaker and background noise differences, simultaneously are described. A new adaptation scheme is proposed. Here the acoustic models are first adapted to the target speaker via an MLLR transform. This is followed by adaptation to the target noise environment via model-based vector Taylor series (VTS) compensation. These speaker and noise transforms are jointly est… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
11
0

Year Published

2011
2011
2017
2017

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 21 publications
(13 citation statements)
references
References 11 publications
2
11
0
Order By: Relevance
“…The configuration for estimating VTS transforms was the same as that used for the AURORA-4 task in [19]. For reference the performance on the original AURORA 4 data for these three test sets were 6.9% (01) , 19.5% (04) and 11.8% (08).…”
Section: Aurora-4 Taskmentioning
confidence: 99%
“…The configuration for estimating VTS transforms was the same as that used for the AURORA-4 task in [19]. For reference the performance on the original AURORA 4 data for these three test sets were 6.9% (01) , 19.5% (04) and 11.8% (08).…”
Section: Aurora-4 Taskmentioning
confidence: 99%
“…This is not optimal if considering the nonlinear nature of the mismatch function relating the clean speech and the noisy speech. In a recent work [9], two combination schemes of MLLR and VTS are considered. One combination called "VTS+MLLR" conducts MLLR on top of the standard VTS.…”
Section: Introductionmentioning
confidence: 99%
“…The "Joint" scheme replaces the clean speech model used in the VTS with a speaker-adapted clean speech model by MLLR transform. It is discovered that the speaker's MLLR transform estimated from the noisy speech using the "Joint" scheme still models some of the limitations of the VTS mismatch function [9], i.e. carries information about current noise characteristics.…”
Section: Introductionmentioning
confidence: 99%
“…More recently, a series of studies has been developed, in which speaker and background noise effects are separately characterized using specific transforms. Well-known methods include factorized adaptation [29] and acoustic factorization algorithms [30], [31].…”
Section: Introductionmentioning
confidence: 99%