2023
DOI: 10.32604/cmes.2022.021755
|View full text |Cite
|
Sign up to set email alerts
|

Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems

Abstract: Speech recognition systems have become a unique human-computer interaction (HCI) family. Speech is one of the most naturally developed human abilities; speech signal processing opens up a transparent and hand-free computation experience. This paper aims to present a retrospective yet modern approach to the world of speech recognition systems. The development journey of ASR (Automatic Speech Recognition) has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper. A st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(4 citation statements)
references
References 106 publications
0
2
0
Order By: Relevance
“…In usual tests, the patient answers most of the questions orally to the examiner. We chose not to use speech recognition because of its limitations [29]. Incorrect speech interpretation would have led to false results.…”
Section: Discussionmentioning
confidence: 99%
“…In usual tests, the patient answers most of the questions orally to the examiner. We chose not to use speech recognition because of its limitations [29]. Incorrect speech interpretation would have led to false results.…”
Section: Discussionmentioning
confidence: 99%
“…"select", "place", "remove") . However, human speech signals vary across speakers, speaking styles, content, and uncertain environmental noises which leads to speech recognition systems with low accuracy and accessibility [6]. The most common solution consists of using virtual keyboards, which are easily supported by current MR frameworks and allow users to select letters, numbers or symbols that are transposed to the textual input.…”
Section: Body Contentmentioning
confidence: 99%
“…Various methods are used to extract the features of speech frames, such as Convolutional Layer, LSTM, Transformer [16], or build a graph in speech frames to get more detailed features [17]. Then the encoded speech frames can be used in many tasks, such as speech recognition [18] and direct speech translation [19].…”
Section: Related Workmentioning
confidence: 99%