2014
DOI: 10.1109/lsp.2014.2302080
|View full text |Cite
|
Sign up to set email alerts
|

Large Vocabulary Continuous Speech Recognition With Reservoir-Based Acoustic Models

Abstract: Thanks to research in neural network based acoustic modeling, progress in Large Vocabulary Continuous Speech Recognition (LVCSR) seems to have gained momentum recently. In search for further progress, the present letter investigates Reservoir Computing (RC) as an alternative new paradigm for acoustic modeling. RC unifies the appealing dynamical modeling capacity of a Recurrent Neural Network (RNN) with the simplicity and robustness of linear regression as a model for training the weights of that network. In pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
16
0
1

Year Published

2015
2015
2023
2023

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 27 publications
(17 citation statements)
references
References 45 publications
0
16
0
1
Order By: Relevance
“…Echo State Networks (ESNs), proposed by Herbert Jaeger [15], are a special kind of Recurrent Neural Networks (RNNs) and have achieved comparable results to CNNs in several recognition tasks, such as speech and image recognition [16], [17]. However, they are fairly new in the context of Music Information Retrieval (MIR).…”
Section: Introductionmentioning
confidence: 99%
“…Echo State Networks (ESNs), proposed by Herbert Jaeger [15], are a special kind of Recurrent Neural Networks (RNNs) and have achieved comparable results to CNNs in several recognition tasks, such as speech and image recognition [16], [17]. However, they are fairly new in the context of Music Information Retrieval (MIR).…”
Section: Introductionmentioning
confidence: 99%
“…Note that the masking scheme and BP algorithm is easily extended to multidimensional in -and output sequences (more details are provided in the Supplementary material). The TIMIT task has been studied before in the context of RC, which has shown it to be challenging, typically requiring extremely large reservoirs to obtain competitive performance [26,27]. For all these tasks we compared performance of the fully trained system to traditional RC, where we kept the input and bias masks fixed and random, and only optimised their global scaling and the feedback strength parameter µ.…”
mentioning
confidence: 99%
“…[16] 8.5 6.5 RC-HMM, [17] 6.2 3.9 GMM-HMM (ML), [17] 6.0 3.8 GMM-HMM (MMI+VTLN), [16] -3.0 DNN-HMM (STC features), [18] 5.2 - Table 2 shows a recapitulation of the key performances of some state-of-the-art systems in the field. The fisrt two systems in the list are based on a GMM-HMM acoustic models: the first one was trained using the maximum-likelihood (ML) criteria [17], while the second one using the maximum mutual information (MMI) criteria with the vocal tract length normalization (VTLN) [16].Triefenbach et al [17], proposed also a Reservoir Computing (RC) HMM hybrid system for phoneme recognition using a bigram phonotactic utterance model. The RC-HMM performs significantly better than the MLP-HMM hybrids proposed by Gemello et al [19].…”
Section: Read Continuous Speech Recognition Taskmentioning
confidence: 99%