Small informal meetings of two to four participants are very common in work environments. For this reason, a convenient way for recording and archiving these meetings is of great interest. In order to efficiently archive such meetings, an important task to address is to keep trace of "who talked when" during a meeting. This paper proposes a new multi-modal approach to tackle this speaker activity detection problem. One of the novelty of the proposed approach is that it uses a human tracker that relies on scanning laser range finders (LRFs) to localize the participants. This choice is especially relevant for robotic applications as robots are often equipped with LRFs for navigation purpose. In the proposed system, a table top microphone array in the center of the meeting room acquires the audio data while the LRF based human tracker monitors the movement of the participants. Then the speaker activity detection is performed using Gaussian mixture models that were trained before hand. An experiment reproducing a meeting configuration demonstrates the performance of the system for speaker activity detection. In particular, the proposed hands free system maintains an good level of performance compared to the use of close talking microphone while participants are simultaneously speaking.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.