2020
DOI: 10.1007/s11042-020-10073-7
|View full text |Cite
|
Sign up to set email alerts
|

Automatic speech recognition: a survey

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
58
0
2

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 211 publications
(78 citation statements)
references
References 115 publications
0
58
0
2
Order By: Relevance
“…Therefore, it is evident that to tackle this issue we need to focus on the effective construction of the application domain of IoMT. The components of this application domain can be addressed as advanced level machine learning and deep learning [ 49 ], reasoning [ 50 ], natural language processing [ 51 ], speech recognition [ 52 ] and computer vision (image object recognition) [ 53 ], human-computer interaction, and dialog and narrative generation. From a global perspective, this can be used to incorporate the new generation hardware and software systems that imitate the human brain and cognitive functionality and thus enhance the human decision-making process.…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, it is evident that to tackle this issue we need to focus on the effective construction of the application domain of IoMT. The components of this application domain can be addressed as advanced level machine learning and deep learning [ 49 ], reasoning [ 50 ], natural language processing [ 51 ], speech recognition [ 52 ] and computer vision (image object recognition) [ 53 ], human-computer interaction, and dialog and narrative generation. From a global perspective, this can be used to incorporate the new generation hardware and software systems that imitate the human brain and cognitive functionality and thus enhance the human decision-making process.…”
Section: Discussionmentioning
confidence: 99%
“…Author details 1 College of Computer and Information Engineering, Hohai University, Nanjing, China. 2 School of Engineering Auditing, Jiangsu Key Laboratory of Public Project Audit, Nanjing Audit University, Nanjing, China.…”
Section: Abbreviationsmentioning
confidence: 99%
“…In the front-end processing, the Mel frequency cepstral coefficient (MFCC) is widely used to represent the speech signal [1]. Besides, the perceptual linear predictive (PLP) features [2], spectro-temporal features [3], and cochlear filter cepstral coefficients (CFCC) features [4] have also been successfully used for speech recognition. In the backend classification, the statistical acoustic models are commonly used, such as hidden Markov model (HMM) [5], *Correspondence: wupingping@nau.edu.cn 2 School of Engineering Auditing, Jiangsu Key Laboratory of Public Project Audit, Nanjing Audit University, Nanjing, China Full list of author information is available at the end of the article artificial neural network (ANN) [6], and dynamic Bayesian network (DBN) [7].…”
Section: Introductionmentioning
confidence: 99%
“…D EEP Neural Networks (DNNs) have shown noticeable success in handling Computer Vision (CV) problems, such as image classification [1] , object detection [2], and face recognition [3]. They also have demonstrated great success in other complicated Machine Learning (ML) tasks such as speech recognition [4] and Natural Language Processing (NLP) [5], [6] . However, DNNs have not yet shown remark-able improvement in learning intrinsic concepts that lead to correct output labels [7].…”
Section: Introductionmentioning
confidence: 99%