2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012
DOI: 10.1109/icassp.2012.6288961
|View full text |Cite
|
Sign up to set email alerts
|

Combining eigenvoice speaker modeling and VTS-based environment compensation for robust speech recognition

Abstract: 1Eigenvoice and vector Taylor series (VTS) are good models for speaker differences and environmental variations separately. However, speaker and environmental variation always coexist in real-world speech. In this paper, we propose to combine eigenvoice and VTS. Specifically, we introduce eigenvoice speaker modeling for the clean speech into VTS's nonlinear mismatch function. In contrast, the standard VTS uses speakerindependent modeling to represent the clean speech, regardless of speaker differences. The eig… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2013
2013
2013
2013

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 9 publications
0
2
0
Order By: Relevance
“…(12). 3 The environment transforms are updated while keeping all the other parameters fixed. 4 The speaker clusters are updated while keeping all the other parameter fixed.…”
Section: Factorised Cluster Adaptive Trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…(12). 3 The environment transforms are updated while keeping all the other parameters fixed. 4 The speaker clusters are updated while keeping all the other parameter fixed.…”
Section: Factorised Cluster Adaptive Trainingmentioning
confidence: 99%
“…This requires adaptation data for all possible operating conditions. An alternative approach, acoustic factorisation first proposed in 2001 [1], has been adopted by a number of sites very recently e.g., [2,3,4,5,6]. In parallel with the factorisation approach in speech recognition, there is also work along this line in speech synthesis, e.g., [7,8], where the goal is to synthesis the effect of multiple factors, such as speaker, language and emotion.…”
Section: Introductionmentioning
confidence: 99%