ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746563
|View full text |Cite
|
Sign up to set email alerts
|

A Two-Step Approach to Leverage Contextual Data: Speech Recognition in Air-Traffic Communications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 12 publications
(17 citation statements)
references
References 17 publications
0
8
0
Order By: Relevance
“…• Thousands are correctly identified (100%); • Hundreds are correctly identified (100%); • Numbers between 11 and 99 (e.g., 13,18,34) have very high recognition rates (98%);…”
Section: Technical Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…• Thousands are correctly identified (100%); • Hundreds are correctly identified (100%); • Numbers between 11 and 99 (e.g., 13,18,34) have very high recognition rates (98%);…”
Section: Technical Resultsmentioning
confidence: 99%
“…Information extraction from written text can follow very different approaches and several factors (language, domain, entity type) impact the selected technique [11,12]. The extraction of information in the ATM domain has mainly used knowledge-based methods and machine learning models [5,13].…”
mentioning
confidence: 99%
“…In [13] contextual information (also known as contextual biasing) via n-grams composition (in the HCLG graph 7 ) is merged with semi-supervised learning techniques to further decrease word error rates (WER) on an ASR designed for ATC. Boosting of contextual knowledge during and after decoding has also been explored in [29,30,31,32], where a set of target n-grams are added to further decrease WERs.…”
Section: Related Workmentioning
confidence: 99%
“…LiveATC-Test: the test set is gathered from LiveATC 10 data recorded from publicly accessible VHF radio channels, as a part of ATCO2 project [13,32], and includes pilot and ATCO recordings with accented English from airports located in U.S., Czech Republic, Ireland, Netherlands, and Switzerland. We consider LiveATC-Test as low quality speech data set i.e., signal-to-noise (SNR) ratios goes from 5 to 15 dB [22].…”
Section: Private Databasesmentioning
confidence: 99%
“…A semi-supervised learning approach for enhancing ASR in the ATM domain was employed in [11,12,14]. In [15][16][17], the authors aimed to improve the recognition of the callsigns in ASR by integrating surveillance data. Finally, the authors of [18] investigated the effect of fine-tuning large pre-trained models, trained using a Transformer architecture, for application in the ATC domain.…”
Section: Automatic Speech Recognition (Asr)mentioning
confidence: 99%