Abstract:We can communicate with others in a noisy environment. This phenomenon is known as a "Cocktail Party Effect" and is one of the most important binaural functions. This paper addresses a frequency domain binaural model that plays the role of a binaural function based on an interaural phase and level difference. The proposed model is evaluated not only as a front-end of the speech recognition system, but also as a speech enhancer. According to the evaluation, when the direction of arrival of the target signal and noise differs by 10 , recognition rates improve in comparison with the previous time domain binaural model (TDBM) in any cases. Furthermore, recognition rates show more than 90% when the signal to noise ratio (SNR) is higher than approximately 5 dB. On the other hand, SNR and coherence of the frequency domain binaural model, which is obtained for an evaluation of the speech enhancer, show superior results over the TDBM.
Well-designed instructional material is equally important for successful e-Learning implementation. Teachers and instructors play a major role in terms of designing and building learning content. In one respect, it requires costs in terms of effort, time and experience. In other respects, a good learning content is likely a result of recurring revisions as a result of teaching experience as well as evaluating student activities. In the case of higher educational institutions (HEI) in developing countries (such as Indonesia), resource sharing in many aspects is highly recommended effort against high cost and redundant works, e-Learning is no exception. Sharing and re-using e-Learning content on particular subject between Learning Management Systems (LMS) can be one of the methods. In addition, collaborative teaching may cause a content develops gradually while conducting content sharing. Thus, the capability of synchronizing the content between LMS is necessary. On the other hand, typical e-Learning implementation might not be appropriate due to the concerns of network infrastructure in developing countries. In some areas, the network has less bandwidth and even frequent disconnections. This paper introduces a novel method of sharing e-Learning content between distributed Learning Management Systems by using dynamic content synchronization. This method also suites the need of course sharing which supports collaborative teaching activity. Moreover, this approach is designed to address the needs of content sharing in areas with network infrastructure limitation in terms of bandwidth and availability.
In order to track a rapid transient of pitch, a required frame length of some conventional pitch detection methods is too long. Although there are wavelet based pitch detection methods which require only a few periods of pitch for a frame, they are not robust enough against noise. This paper proposes a new pitch detection method which can work properly under noisy environments even if a frame duration is short. The proposed method consists of a power level detector, a signal analyzer, an autocorrelator, a voiced-unvoiced detector and a lag time interpolator. The signal analyzer is based on the continuous wavelet transform using a harmonic analyzing wavelet. Usage of the harmonic analyzing wavelet gives us more information about a pitch in a scalogram. Simulations of pitch detection for a harmonic chirp signal and speech signals are performed. Performances are compared with two conventional pitch detection methods, cepstrum and modified correlation methods. As a result, a performance of a pitch detection by the proposed method under a noisy environment is better than that of the other two conventional methods. In particular, the largest improvement of performance is obtained for male voices.
Sound source localization and signal segregation using a small number of microphone elements is expected in not only multimedia products but also in daily-use products, such as hearing aids. The frequency domain binaural model can localize a sound source and segregate signals coming from a specific direction using two input signals. In this paper, a method of two sound sources localization in azimuth and elevation using interaural phase and level differences is proposed. The performance of this localization is examined by computer simulations for two concurrent speakers. In addition, the performance of the proposed method on the median plane is also confirmed. As a result, the proposed method is found to localize two sound sources in azimuth and elevation simultaneously. The possibility of sound source localization within 10 error for one sound source is 60-80% when the segmental power ratio is 0 dB. The possibility for the other sound source is 40-70%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.