2022
DOI: 10.3390/aerospace9080395
|View full text |Cite
|
Sign up to set email alerts
|

Speech GAU: A Single Head Attention for Mandarin Speech Recognition for Air Traffic Control

Abstract: The rise of end-to-end (E2E) speech recognition technology in recent years has overturned the design pattern of cascading multiple subtasks in classical speech recognition and achieved direct mapping of speech input signals to text labels. In this study, a new E2E framework, ResNet–GAU–CTC, is proposed to implement Mandarin speech recognition for air traffic control (ATC). A deep residual network (ResNet) utilizes the translation invariance and local correlation of a convolutional neural network (CNN) to extra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(14 citation statements)
references
References 30 publications
0
14
0
Order By: Relevance
“…The selected articles show that the end-to-end speech recognition framework is a study trend for better ASR strategies over traditional HMM-based models, especially given the challenges of multilingual recognition, special use of language, and high speech rate. Among the five articles which include ASR for aviation as a primary research objective, three articles adopted an end-to-end framework [3,8,17]. Neural networks-based methods are also widely used in ASR.…”
Section: Automatic Speech Recognitionmentioning
confidence: 99%
See 3 more Smart Citations
“…The selected articles show that the end-to-end speech recognition framework is a study trend for better ASR strategies over traditional HMM-based models, especially given the challenges of multilingual recognition, special use of language, and high speech rate. Among the five articles which include ASR for aviation as a primary research objective, three articles adopted an end-to-end framework [3,8,17]. Neural networks-based methods are also widely used in ASR.…”
Section: Automatic Speech Recognitionmentioning
confidence: 99%
“…Hence, a structural topic modeling (STM) approach performs better when applied to NTSB reports [2]. NTSB aviation accident reports have non-standard abbreviations that add noise to the model-learning process [3,14]. Similar issues also exist in non-English databases, such as a study on Chinese civil aviation incident reports from the period 2007-2021 by Jiao et al [9] indicates that the experiment was interfered with due to invalid information, such as inconsistent writing norms and standards from different airlines.…”
Section: Ambiguity and Contextmentioning
confidence: 99%
See 2 more Smart Citations
“…Finally, a hybrid recommendation model based on ranking learning is proposed, which effectively solves the problem of weight distribution of the results of different recommendation algorithms [2]. Speech enhancement technology also has important applications in cochlear implants and hearing aids [3]. As a preprocessing module of hearing aids, it greatly reduces the noise interference of hearing aids to obtain a higher target speech signal-to-noise ratio [4].…”
Section: Introductionmentioning
confidence: 99%