An investigation is undertaken of the ability to hear a signal sound in the presence of noise, when the signal and the interfering sound each have a directional characteristic and are separated from each other more than a certain degree, compared with the case when both are not separated. We attempt to determine through what process in our auditory system such a phenomenon occurs. We think that, when there are many sounds coming from many directions, directional information is extracted from the neural signals from both ears in our auditory system, each sound is localized in each place or direction, and a certain particular sound is selected and listened to. There is considerable difference, in the subject's ability to hear the sound, between the case in which a subject concentrates his attention on it, and the case in which he does not. For this interpretation, “attention” is introduced. It is clarified that the function that maintains the attention leads to the selective hearing of a signal relative to its direction and timbre.
Although there are several factors causing ''cocktail party effect'' after more than half a century of research, the major one is considered to be the spatial separation of the target signal and the interferer. This paper will overview developments of the improvement of performance resulting from the directional separation of the target signal from interferers when listening in a field or through headphones. The basic assumption concerning the cocktail party effect is that there are one or more interfering sound sources in addition to the target signal source. In this situation it is important to remember the selective attention effect, which attenuates the interfering sound by concentrating the attention on a specific signal. Pitch of sound is the simplest cue for selective attention; however, spatial information can also be one. The latter half of this review discusses the effect of spatial filtering and an attention filter on the frequency domain.
Abstract:We can communicate with others in a noisy environment. This phenomenon is known as a "Cocktail Party Effect" and is one of the most important binaural functions. This paper addresses a frequency domain binaural model that plays the role of a binaural function based on an interaural phase and level difference. The proposed model is evaluated not only as a front-end of the speech recognition system, but also as a speech enhancer. According to the evaluation, when the direction of arrival of the target signal and noise differs by 10 , recognition rates improve in comparison with the previous time domain binaural model (TDBM) in any cases. Furthermore, recognition rates show more than 90% when the signal to noise ratio (SNR) is higher than approximately 5 dB. On the other hand, SNR and coherence of the frequency domain binaural model, which is obtained for an evaluation of the speech enhancer, show superior results over the TDBM.
In order to track a rapid transient of pitch, a required frame length of some conventional pitch detection methods is too long. Although there are wavelet based pitch detection methods which require only a few periods of pitch for a frame, they are not robust enough against noise. This paper proposes a new pitch detection method which can work properly under noisy environments even if a frame duration is short. The proposed method consists of a power level detector, a signal analyzer, an autocorrelator, a voiced-unvoiced detector and a lag time interpolator. The signal analyzer is based on the continuous wavelet transform using a harmonic analyzing wavelet. Usage of the harmonic analyzing wavelet gives us more information about a pitch in a scalogram. Simulations of pitch detection for a harmonic chirp signal and speech signals are performed. Performances are compared with two conventional pitch detection methods, cepstrum and modified correlation methods. As a result, a performance of a pitch detection by the proposed method under a noisy environment is better than that of the other two conventional methods. In particular, the largest improvement of performance is obtained for male voices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.