2022 IEEE Spoken Language Technology Workshop (SLT) 2023
DOI: 10.1109/slt54892.2023.10022718
|View full text |Cite
|
Sign up to set email alerts
|

Bertraffic: Bert-Based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications

Abstract: Automatic speech recognition (ASR) allows transcribing the communications between air traffic controllers (ATCOs) and aircraft pilots. The transcriptions are used later to extract ATC named entities, e.g., aircraft callsigns. One common challenge is speech activity detection (SAD) and speaker diarization (SD). In the failure condition, two or more segments remain in the same recording, jeopardizing the overall performance. We propose a system that combines SAD and a BERT model to perform speaker change detecti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
4
2

Relationship

5
1

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 45 publications
0
12
0
Order By: Relevance
“…The main idea is to guarantee safety and reduce miscommunications between the ATCos and pilots. Therefore, previous work has shown the potential in performing SRD in an E2E manner on the text-level, as presented here (see in [8,81]).…”
Section: Modelmentioning
confidence: 77%
See 2 more Smart Citations
“…The main idea is to guarantee safety and reduce miscommunications between the ATCos and pilots. Therefore, previous work has shown the potential in performing SRD in an E2E manner on the text-level, as presented here (see in [8,81]).…”
Section: Modelmentioning
confidence: 77%
“…In order to fine-tune the model, we append a layer on top of the BERT model by using a feedforward network with a dimension of 8 (we define two outputs per class, see the class structures in Section 3.3 of ref. [8] and in [15]). Due to the lack of gold transcriptions, we perform a fivefold cross-validation scheme to avoid overfitting.…”
Section: Named Entity Recognition For Air Traffic Control Communicationsmentioning
confidence: 92%
See 1 more Smart Citation
“…ATCO2 5 is also a well-known project that developed a pipeline [20] for automatic collection and pre-processing of large quantities of ATC audio data. Their main focus covered downstream task such as ASR [21], named-entity recognition [22], and acoustic and text-based speaker role recognition [23,24]. HAAWAII 6 project develops a reliable and adaptable solution to automatically transcribe voice utterances issued by both ATCOs and pilots.…”
Section: Related Workmentioning
confidence: 99%
“…Our previous research on identifying speaker roles [4] mainly focused on a grammar-based bag-of-words system that was capable of performing speaker role identification with precision/recall values of 0.82/0.81 for ATCos and 0.84/0.85 for pilots, respectively. Also, in [28][29][30], we explored speaker change detection for ATC text. In [31], the authors mentioned that manually annotating pilot recordings was more challenging than annotating ATCo recordings due to their quality, speech rate, speaker accent, etc.…”
Section: Speaker Role Classificationmentioning
confidence: 99%