2018
DOI: 10.1109/taslp.2018.2847459
|View full text |Cite
|
Sign up to set email alerts
|

Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks

Abstract: Speech Intelligibility Prediction (SIP) algorithms are becoming popular tools within the development and operation of speech processing devices and algorithms. However, many SIP algorithms require knowledge of the underlying clean speech; a signal that is often not available in real-world applications. This has led to increased interest in non-intrusive SIP algorithms, which do not require clean speech to make predictions. In this paper we investigate the use of Convolutional Neural Networks (CNNs) for non-int… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
24
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 45 publications
(27 citation statements)
references
References 64 publications
3
24
0
Order By: Relevance
“…Kendall's (τ) values obtained by the N-MTTL SI model (ResNet-18 MT) are 0.80 and 0.69 for seen and unseen conditions, respectively. Overall, the accuracy measures of the model are comparable to the literature [17], where similar values are observed for unseen conditions. However, conditions in our dataset are not identical to [17], therefore direct comparison is not possible.…”
Section: Speech Intelligibility Predictionsupporting
confidence: 78%
See 3 more Smart Citations
“…Kendall's (τ) values obtained by the N-MTTL SI model (ResNet-18 MT) are 0.80 and 0.69 for seen and unseen conditions, respectively. Overall, the accuracy measures of the model are comparable to the literature [17], where similar values are observed for unseen conditions. However, conditions in our dataset are not identical to [17], therefore direct comparison is not possible.…”
Section: Speech Intelligibility Predictionsupporting
confidence: 78%
“…The residual block also contains a batch normalization layer and Rectified Linear Unit (ReLU) activation function, which is used after every convolutional layer [26]. Previous work [17] has explored the importance of the convolution layer to extract spectro-temporal patterns in the input signal related to speech intelligibility. Therefore, we expect the convolutional layers of ResNet to be beneficial for both our specific tasks in our N-MTTL SI model.…”
Section: Resnet (Residual Network)mentioning
confidence: 99%
See 2 more Smart Citations
“…However, since there is no reference signal for the receiver such as communication, a non-intrusive intelligibility estimation method is required. In [3], the speech intelligibility is predicted using convolutional neural network which is trained with measured intelligibility scores that humans listen and evaluate. The work in [4] presented the method of speech intelligibility prediction by using automatic speech recognition (ASR) system based deep neural networks.…”
Section: Introductionmentioning
confidence: 99%