Speech intelligibility prediction with the dynamic compressive gammachirp filterbank and modulation power spectrum

Yamamoto, Kazumasa; Matsui, Toshie; Araki, Shoko; Kinoshita, Keisuke; Nakatani, Tomohiro

doi:10.1250/ast.40.84

Cited by 3 publications

(10 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The dcGC-FB was also used in models for speech intelligibility prediction [9][10][11][12][13]. A new model referred to as GEDI (the gammachirp envelope distortion index) [10,11] predicted the intelligibility of speech sounds processed with non-linear enhancement algorithms better than other recent indexes like STOI, CSII, and HASPI [12,13].…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

The gammachirp auditory filter and its application to speech perception

Patterson

2020

Acoust. Sci. & Tech.

Self Cite

View full text Add to dashboard Cite

We review the gammachirp (GC) auditory filter and its use in speech perception research. The GC was originally developed to explain the asymmetric, auditory filter shapes derived in notchednoise (NN) masking studies, and the strongly compressive input-output function observed in the mammalian cochlea. This compressive GC was fitted to a very large collection of notched-noise (NN) masking thresholds measured with a wide range of stimulus levels and center frequencies. The fit showed how the GC auditory filter could explain NN masking throughout the domain of human hearing with a relatively small number of parameters, only one of which was level dependent. Subsequently, a dynamic, compressive GC filterbank (dcGC-FB) was developed to simulate timedomain cochlear processing. This dcGC-FB has been used to cancel the peripheral compression of normal hearing and thereby simulate the most common forms of hearing loss. This simulator allows normal hearing listeners to experience the difficulties of hearing impaired listeners. It has been used in training courses for speech-language-hearing therapists and psychoacoustic experiments. The dcGC-FB has also been used for modeling speaker size perception and predicting speech intelligibility with GEDI (the gammachirp envelope distortion index).

show abstract

Section: Resultsmentioning

confidence: 99%

“…The GC architecture enables us to construct a hearing impairment simulator which allows normal hearing listeners to experience the difficulties of hearing impaired listeners [5,6]. The GC has also been used to model speaker size perception [7,8] and speech intelligibility [9][10][11][12][13].…”

Section: Introductionmentioning

confidence: 99%

The gammachirp auditory filter and its application to speech perception

Patterson

2020

Acoust. Sci. & Tech.

Self Cite

View full text Add to dashboard Cite

show abstract

“…To incorporate characteristics of a human auditory filter, Yamamoto et al (2019) extended sEPSM using a dynamic compressive gammachirp filterbank (dcGC-FB) (Irino & Patterson, 2006), in which the level-dependent frequency selectivity and gain of the auditory filter were reasonably determined by the data obtained from psychoacoustic masking experiments (Patterson et al, 2003). For OIMs, it is important to introduce the appropriate level dependency to incorporate the well-known fundamental knowledge that speech intelligibility is lower as sound level decreases and that peripheral hearing loss decreases the intelligibility.…”

Section: Objective Intelligibility Measures For Speech Enhancementmentioning

confidence: 99%

“…A bank of modulation filters, defined in envelope frequency domain (f env ), is applied to the spectra. There are seven modulation filters whose power spectra are W f c env (f env ) for the modulation center frequency of f c env , as illustrated in Figure 2 and described in previous studies (Jørgensen & Dau, 2011;Yamamoto et al, 2019). The envelope power at the output of the modulation filter is calculated as…”

Section: Sdr In the Envelope Modulation Domainmentioning

confidence: 99%

“…where the asterisk (*) represents either S or D, and E Ŝ (0) represents the 0-th order coefficient of the FFT (i.e., the direct-current (DC) component of the temporal envelope). Yamamoto et al (2019) reported that normalization of the E Ŝ (0) in dcGC-sEPSM was effective for speech intelligibility prediction of enhanced speech. The normalization has been inherited by GEDI which has the same filterbank structure as dcGC-sEPSM.…”

Section: Sdr In the Envelope Modulation Domainmentioning

confidence: 99%

See 1 more Smart Citation

GEDI: Gammachirp Envelope Distortion Index for Predicting Intelligibility of Enhanced Speech

Yamamoto,

Irino,

Araki

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

In this study, we propose a new concept, the gammachirp envelope distortion index (GEDI), based on the signal-to-distortion ratio in the auditory envelope, SDR env , to predict the intelligibility of speech enhanced by nonlinear algorithms. The objective of GEDI is to calculate the distortion between enhanced and clean-speech representations in the domain of a temporal envelope extracted by the gammachirp auditory filterbank and modulation filterbank. We also extend GEDI with multi-resolution analysis (mr-GEDI) to predict the speech intelligibility of sounds under non-stationary noise conditions. We evaluate GEDI in terms of the speech intelligibility predictions of speech sounds enhanced by a classic spectral subtraction and a Wiener filtering method. The predictions are compared with human results for various signal-to-noise ratio conditions with additive pink and babble noises. The results showed that mr-GEDI predicted the intelligibility curves better than short-time objective intelligibility (STOI) measure, extended-STOI (ESTOI) measure, and hearing-aid speech perception index (HASPI) under pink-noise conditions, and better than HASPI under babble-noise conditions. The mr-GEDI method does not present an overestimation tendency and is considered a more conservative approach than STOI and ESTOI. Therefore, the evaluation with mr-GEDI may provide additional information in the development of speech enhancement algorithms.

show abstract