IntroductionLate reflections degrade speech intelligibility [1], whereas early reflections often help speech intelligibility, and this is called the Haas effect (e.g., [2]). It has been reported that the main cause of degradation in speech intelligibility in reverberant environments is overlap-masking [3][4][5]. Because of overlap-masking, reverberant components of prior speech segments mask successive segments. As a result, speech segments following reverberating segments are more difficult to understand. As the energy of the prior segments increases, the effect of overlap-masking also increases. This is particularly important when the preceding segment is a vowel, which has more power, and the subsequent segment is a consonant, which has less power [6,7].A number of researches have proposed and discussed how the intelligibility of speech in reverberation can be estimated from an impulse response of a room. Reverberation time, such as T 60 , is a simple objective parameter for estimating reverberation [8]. Speech intelligibility usually decreases as T 60 becomes longer, but different rooms having the same T 60 might yield different degrees of speech intelligibility. One example is the case where T 60 is the same in different rooms, but the energy ratios of the direct-to-reverberated sounds are different. The Deutlichkeit value, such as D 50 [9,10] and Clarity, such as C 50 [9,11], take this direct-to-reverberation ratio into account. The speech transmission index (STI) is another parameter that is widely used to measure speech intelligibility objectively [12,13]. STI is based on the fact that the modulation transfer function depends on reverberation [14].The intelligibility of speech also depends on the speech signal itself. To reduce overlap-masking, Arai et al. [6,7] proposed ''steady-state suppression'' as a preprocess for speech signals in reverberant environments. Strange et al. [15] showed that the information in steady-state portions of a speech signal was relatively insignificant compared with the information in transient portions. Additionally, steady-state portions usually have more energy compared with transient portions. In the ''steady-state suppression'' technique, overlapmasking is reduced by estimating and suppressing steady-state
To improve speech intelligibility in reverberant environments, Arai et al. proposed ‘‘steady-state suppression (SSS)’’ as preprocessing [Arai et al., Acoust. Sci. Technol. 23, 229–232 (2002)]. In this study, a perceptual experiment under artificial reverberant conditions with simulated impulse responses was conducted to elucidate the effect of the Deutlichkeit (D) value and reverberation time (RT) on improvements of speech intelligibility because of SSS. Artificial impulse responses were simulated with white noise multiplied by a decay curve. The advantage of this method is that the simulated impulse responses have mutually similar frequency characteristics; consequently, we can evaluate them using only the D value and RT regardless of their different frequency characteristics. Two parameters, the energy of the impulse response 50 ms from the direct sound and the attenuation rate of the decay curve, were controlled to obtain several impulse responses having certain D value and RT. Results show that SSS improved speech intelligibility in the conditions of low D value, even if RT was long or short. We could also interpret these results as indicating that processing is effective when the original speech intelligibility is less than 60%. [Work supported by JSPS.KAKENHI (16203041).]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.