Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1407
|View full text |Cite
|
Sign up to set email alerts
|

Twin Regularization for Online Speech Recognition

Abstract: Online speech recognition is crucial for developing natural human-machine interfaces. This modality, however, is significantly more challenging than off-line ASR, since real-time/lowlatency constraints inevitably hinder the use of future information, that is known to be very helpful to perform robust predictions.A popular solution to mitigate this issue consists of feeding neural acoustic models with context windows that gather some future frames. This introduces a latency which depends on the number of employ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 12 publications
(8 citation statements)
references
References 34 publications
0
8
0
Order By: Relevance
“…1, a standard CNN pipeline (pooling, normalization, activations, dropout) can be employed after the first sinc-based convolution. Multiple standard convolutional, fully-connected or recurrent layers [37][38][39][40] can then be stacked together to finally perform a speaker classification with a softmax classifier.…”
Section: The Sincnet Architecturementioning
confidence: 99%
“…1, a standard CNN pipeline (pooling, normalization, activations, dropout) can be employed after the first sinc-based convolution. Multiple standard convolutional, fully-connected or recurrent layers [37][38][39][40] can then be stacked together to finally perform a speaker classification with a softmax classifier.…”
Section: The Sincnet Architecturementioning
confidence: 99%
“…The current version supports standard MLPs, CNNs, RNNs, LSTM, and GRU models. Moreover, it supports some advanced recurrent architectures, such as the recently-proposed Light GRU [31] and twin-regularized RNNs [32]. The SincNet model [33,34] is also implemented to perform speech recognition from raw waveform directly.…”
Section: Dnn Acoustic Modelingmentioning
confidence: 99%
“…The results show that the proposed system performance gap between CIphone CTC and the A2W model is reduced. Ravanelli et al [266] performed experiments on TIMIT, DIRHA, CHiME, and LibriSpeech databases. Hybrid feature extraction techniques were employed using MFCC, fBANKS, and FMLLR to train the RNN-HMM system, and a significant improvement over standard RNN systems has been reported.…”
Section: Englishmentioning
confidence: 99%