2013 IEEE Workshop on Automatic Speech Recognition and Understanding 2013
DOI: 10.1109/asru.2013.6707705
|View full text |Cite
|
Sign up to set email alerts
|

Speaker adaptation of neural network acoustic models using i-vectors

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
384
0
6

Year Published

2014
2014
2019
2019

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 569 publications
(392 citation statements)
references
References 13 publications
2
384
0
6
Order By: Relevance
“…(3) As ability to express complex objective functions in network, shallow structure neural network sometimes cannot well realize the complex high-dimensional function. (4) In terms of computational complexity of the network structure, with the depth of k compact network structure to express a certain function, in the depth of less than the network structure to express the function of k, it may need to increase the number of the exponential scale factor calculation, greatly increase the complexity of the calculation [31][32][33][34].…”
Section: The Modified Deep Convolution Neural Networkmentioning
confidence: 99%
“…(3) As ability to express complex objective functions in network, shallow structure neural network sometimes cannot well realize the complex high-dimensional function. (4) In terms of computational complexity of the network structure, with the depth of k compact network structure to express a certain function, in the depth of less than the network structure to express the function of k, it may need to increase the number of the exponential scale factor calculation, greatly increase the complexity of the calculation [31][32][33][34].…”
Section: The Modified Deep Convolution Neural Networkmentioning
confidence: 99%
“…Among them, establishing a more appropriate combination of frame posteriors obtained in DNNs; exploring different fusions among DNNs and ivector systems [39]; and dealing with unbalanced training data. Note that even though we proposed different ways of combining posteriors, all of them were blind (no need for training), as we focused on real-time applications and simple approaches.…”
Section: Future Workmentioning
confidence: 99%
“…Some studies [22], [23] have shown that performance can be improved by supplying complementary features as inputs to the network in parallel with the regular acoustic features for speech recognition. Motivated by the above work, we propose augmenting the acoustic features from microphone array with the spatial information to further improve the performance of multichannel speech recognition.…”
Section: Introductionmentioning
confidence: 99%