2015
DOI: 10.48550/arxiv.1507.08240
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 36 publications
(19 citation statements)
references
References 25 publications
0
19
0
Order By: Relevance
“…The landscape of neural network acoustic modeling is rapidly evolving. Spurred by the success of deep feed-forward neural nets for LVCSR in [1] and inspired by other research areas like image classification and natural language processing, many speech groups have looked at more sophisticated architectures such as deep convolutional nets [2,3], deep recurrent nets [4], time-delay neural nets [5], and long-short term memory nets [6,7,8,9]. The trend is to remove a lot of the complexity and human knowledge that was necessary in the past to build good ASR systems (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…The landscape of neural network acoustic modeling is rapidly evolving. Spurred by the success of deep feed-forward neural nets for LVCSR in [1] and inspired by other research areas like image classification and natural language processing, many speech groups have looked at more sophisticated architectures such as deep convolutional nets [2,3], deep recurrent nets [4], time-delay neural nets [5], and long-short term memory nets [6,7,8,9]. The trend is to remove a lot of the complexity and human knowledge that was necessary in the past to build good ASR systems (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…In this paper, the contribution of the innovative research are summarised as below: [13][14][15][16][17][18][19][20]…”
Section: Discussionmentioning
confidence: 99%
“…Various deep learning architectures such as deep neural networks, convolutional deep neural networks, deep belief networks and recurrent neural networks have been developed. Deep learning has been successfully applied in the areas of speech recognition [15,16,24], image processing [17], object detection [18,19], drug discovery [20] and genomics [21], etc. Especially, deep convolutional nets (ConvNets) have brought about breakthroughs in processing images [22], video [23], speech [24] and audio [25], whereas recurrent nets have shone light on sequential data such as text and speech [26].…”
Section: Existing Work In Deep Learningmentioning
confidence: 99%
“…We next evaluate our models on Wall Street Journal (WSJ) corpus (available as LDC corpus LDC93S6B and LDC94S13B), where we use the full 81 hour set "si284" for training, set "dev93" for validation and set "eval92" for test. We follow the same data preparation process and model setting as in [18], and we use 59 characters as the targets for the acoustic modelling. Decoding is done with the CTC [21] based weighted finite-state transducers (WFSTs) [22] as proposed by [18].…”
Section: Speech Recognitionmentioning
confidence: 99%
“…We follow the same data preparation process and model setting as in [18], and we use 59 characters as the targets for the acoustic modelling. Decoding is done with the CTC [21] based weighted finite-state transducers (WFSTs) [22] as proposed by [18].…”
Section: Speech Recognitionmentioning
confidence: 99%