2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014
DOI: 10.1109/icassp.2014.6855085
|View full text |Cite
|
Sign up to set email alerts
|

An auto-encoder based approach to unsupervised learning of subword units

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
50
3

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 66 publications
(54 citation statements)
references
References 8 publications
1
50
3
Order By: Relevance
“…The stacked network is trained one layer at a time, each layer minimizing the loss of its output with respect to its input. A number of studies have shown that hidden representations from an intermediate layer in such a stacked AE are useful as features in speech applications [31,[33][34][35][36][37][38].…”
Section: Autoencoder Featuresmentioning
confidence: 99%
“…The stacked network is trained one layer at a time, each layer minimizing the loss of its output with respect to its input. A number of studies have shown that hidden representations from an intermediate layer in such a stacked AE are useful as features in speech applications [31,[33][34][35][36][37][38].…”
Section: Autoencoder Featuresmentioning
confidence: 99%
“…By presenting the same data at the input and the output of the network while constraining intermediate connections, the network is trained to find an internal representation that is useful for reconstruction. These internal representations can be useful as features [36][37][38][39][40][41]. Like BNFs, autoencoders can be trained on languages different from the target language (often resulting in more data to train on).…”
Section: Autoencoder Featuresmentioning
confidence: 99%
“…Features should ideally disregard irrelevant information (such as speaker and gender), while capturing linguistically meaningful contrasts (such as phone or word categories). Several different unsupervised frame-level acoustic feature learning methods have been developed over the last few years [6]- [12], with neural networks being used in a number of studies [13]- [17].…”
Section: Introductionmentioning
confidence: 99%