Individuals with hearing impairment have particular difficulty perceptually segregating concurrent voices and understanding a talker in the presence of a competing voice. In contrast, individuals with normal hearing perform this task quite well. This listening situation represents a very different problem for both the human and machine listener, when compared to perceiving speech in other types of background noise. A machine learning algorithm is introduced here to address this listening situation. A deep neural network was trained to estimate the ideal ratio mask for a male target talker in the presence of a female competing talker. The monaural algorithm was found to produce sentence-intelligibility increases for hearing-impaired (HI) and normal-hearing (NH) listeners at various signal-to-noise ratios (SNRs). This benefit was largest for the HI listeners and averaged 59%-points at the least-favorable SNR, with a maximum of 87%-points. The mean intelligibility achieved by the HI listeners using the algorithm was equivalent to that of young NH listeners without processing, under conditions of identical interference. Possible reasons for the limited ability of HI listeners to perceptually segregate concurrent voices are reviewed as are possible implementation considerations for algorithms like the current one.
Individuals with hearing impairment have particular difficulty perceptually segregating concurrent voices and understanding a talker in the presence of a competing voice. In contrast, individuals with normal hearing perform this task quite well. A machine learning algorithm is introduced here to address this listening situation. A deep neural network was trained to estimate the ideal ratio mask for a target talker in the presence of a single competing talker. The algorithm was found to produce sentence-intelligibility increases for hearing-impaired and normal-hearing listeners at various signal-to-noise ratios. This benefit was largest for the hearing-impaired listeners and averaged 59 %-points at the least-favorable SNR, with a maximum of 87 %-points. The mean intelligibility achieved by the hearing-impaired listeners using the algorithm was equivalent to that of young normal-hearing listeners without processing, under conditions of identical interference. Possible reasons for the limited ability of hearing-impaired listeners to perceptually segregate concurrent voices are also addressed.
Time-frequency (T-F) masks represent powerful tools to increase the intelligibility of speech in background noise. Translational relevance is provided by their accurate estimation based only on the signal-plus-noise mixture, using deep learning or other machine-learning techniques. In the current study, a technique is designed to capture the benefits of existing techniques. In the ideal quantized mask (IQM), speech and noise are partitioned into T-F units, and each unit receives one of N attenuations according to its signal-to-noise ratio. It was found that as few as four to eight attenuation steps (IQM 4 , IQM 8) improved intelligibility over the ideal binary mask (IBM, having two attenuation steps), and equaled the intelligibility resulting from the ideal ratio mask (IRM, having a theoretically infinite number of steps). Sound-quality ratings and rankings of noisy speech processed by the IQM 4 and IQM 8 were also superior to that processed by the IBM and equaled or exceeded that processed by the IRM. It is concluded that the intelligibility and sound-quality advantages of infinite attenuation resolution can be captured by an IQM having only a very small number of steps. Further, the classification-based nature of the IQM might provide algorithmic advantages over the regressionbased IRM during machine estimation. V
Binary masking represents a powerful tool for increasing speech intelligibility in noise. An essential aspect involves the local criterion (LC), which defines the signal-to-noise ratio below which time-frequency units are discarded. But binary masking is a victim of its own success in one regard—it produces ceiling sentence intelligibility across a broad range of LC values, making the exact optimal LC value difficult to determine. Further, the optimal value for hearing-impaired (HI) listeners is largely unknown. In the current study, the optimal LC was determined in normal-hearing (NH) and HI listeners using speech materials less likely to produce ceiling effects. The CID W22 words were mixed with noise consisting of recordings from a busy hospital cafeteria, then subjected to ideal binary masking. LC values ranged from -20 to + 5 dB relative to the overall SNR of -8 dB. NH subjects were tested at 65 dBA and HI subjects were tested at 65 dBA plus NAL-RP hearing-aid gains. Preliminary results suggest that the optimal LC is similar for NH and HI listeners. Additional conditions involving different speech materials and noise types suggest that the optimal LC can vary as a function of speech and/or noise type. [Work supported by NIH.]
Recent work has shown that a machine learning algorithm can produce large speech intelligibility in noise increases for hearing-impaired listeners. This algorithm involves a deep neural network trained through supervised learning to estimate the ideal binary or ratio mask. The direct translational potential of this work is addressed currently. Primary issues surrounding future implementation into hearing aids and cochlear implants involve (i) the ability to generalize to conditions not encountered during training, and (ii) the computational load associated with operation of such an algorithm. Substantial advances have been made with regard to generalization. These will be outlined as will associated decisions that can be made. The computational load associated with training and operation of a network will also be addressed. Here, we propose an alternative implementation that offers multiple advantages over current approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.