2019
DOI: 10.3390/sym11050644
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Mandarin Speech Recognition Combining CNN and BLSTM

Abstract: Since conventional Automatic Speech Recognition (ASR) systems often contain many modules and use varieties of expertise, it is hard to build and train such models. Recent research show that end-to-end ASRs can significantly simplify the speech recognition pipelines and achieve competitive performance with conventional systems. However, most end-to-end ASR systems are neither reproducible nor comparable because they use specific language models and in-house training databases which are not freely available. Thi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
15
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 34 publications
(16 citation statements)
references
References 20 publications
1
15
0
Order By: Relevance
“…The second approach is end-to-end speech recognition. It differs from sequential hierarchical analysis in that it allows you to analyze the original signal and move to higher levels of analysis (for example, the level of words), bypassing lower levels [17,18].…”
Section: Methods Of Syllable Recognitionmentioning
confidence: 99%
“…The second approach is end-to-end speech recognition. It differs from sequential hierarchical analysis in that it allows you to analyze the original signal and move to higher levels of analysis (for example, the level of words), bypassing lower levels [17,18].…”
Section: Methods Of Syllable Recognitionmentioning
confidence: 99%
“…In order to predict the missing sequence of GPS points in between the trajectory, this is necessary to know the information both from previous and future timesteps. For this purpose, we make use of bidirectional encoder so that the movement patterns in both directions can be captured [ 47 ]. Due to the spatiotemporal nature of GPS trajectory data, we make use of ConvLSTM architecture [ 43 ] in our model that is able to deal with both spatial and temporal dependencies in the data.…”
Section: Approachmentioning
confidence: 99%
“…Sainath et al (2015) combined CNNs, LSTMs, and DNNs into a unified deep learning system, which is named as CLDNN, for taking advantage of each architecture and achieved better performance than LSTM which is considered as strongest architecture of these three alternatives in speech recognition. Wang et al (2019) proposed CNN-BLSTM-CTC deep learning hybrid model for Mandarin speech recognition. They employed CNN for learning of local speech features, BLSTM for learning past and future dependencies, and CTC for decoding purposes and claim that their proposed method outperformed best existing model.…”
Section: Hybrid Approaches For Speech Recognitionmentioning
confidence: 99%