2023
DOI: 10.3390/app131910834
|View full text |Cite
|
Sign up to set email alerts
|

Non-Intrusive Air Traffic Control Speech Quality Assessment with ResNet-BiLSTM

Yuezhou Wu,
Guimin Li,
Qiang Fu

Abstract: In the current field of air traffic control speech, there is a lack of effective objective speech quality evaluation methods. This paper proposes a new network framework based on ResNet–BiLSTM to address this issue. Firstly, the mel-spectrogram of the speech signal is segmented using the sliding window technique. Next, a preceding feature extractor composed of convolutional and pooling layers is employed to extract shallow features from the mel-spectrogram segment. Then, ResNet is utilized to extract spatial f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 39 publications
0
1
0
Order By: Relevance
“…With the advancement of deep learning, researchers have turned to deep learning methods for extracting speech emotion features. Common deep learning methods for speech feature extraction include Convolutional Neural Networks (CNNs) [38,39], Recurrent Neural Networks (RNNs) [40], Bidirectional Long Short-Term Memory (BiLSTM) [41][42][43], etc. The Wav2vec2.0 version used in this paper is an end-to-end training approach that can learn representative feature descriptions directly from audio data through self-supervised learning, eliminating the need for manual parameter tuning.…”
Section: Wav2vec Speech Featuresmentioning
confidence: 99%
“…With the advancement of deep learning, researchers have turned to deep learning methods for extracting speech emotion features. Common deep learning methods for speech feature extraction include Convolutional Neural Networks (CNNs) [38,39], Recurrent Neural Networks (RNNs) [40], Bidirectional Long Short-Term Memory (BiLSTM) [41][42][43], etc. The Wav2vec2.0 version used in this paper is an end-to-end training approach that can learn representative feature descriptions directly from audio data through self-supervised learning, eliminating the need for manual parameter tuning.…”
Section: Wav2vec Speech Featuresmentioning
confidence: 99%