Abstract-This paper presents a bottom-up approach that combines audio and video to simultaneously locate individual speakers in the video (2-D source localization) and segment their speech (speaker diarization), in meetings recorded by a single stationary camera and a single microphone. The novelty lies in using motion information from the entire body rather than just the face to perform these tasks, which permits processing nonfrontal views unlike previous work. Since body-movements do not exhibit instantaneous signal-level synchrony with speech, the approach targets long term co-occurrences between audio and video subspaces.
We present a system to retrieve all clips from a meeting archive that show a particular individual speaking, using a single face or voice sample as the query. The system incorporates three novel ideas. One, rather than match the query to each individual sample in the archive, samples within a meeting are grouped first, generating a cluster of samples per individual. The query is then matched to the cluster, taking advantage of multiple samples to yield a robust decision. Two, automatic audio-visual association is performed which allows a bi-modal retrieval of clips, even when the query is uni-modal. Three, the biometric recognition uses individual-specific score distributions learnt from the clusters, in a likelihood ratio based decision framework that obviates the need for explicit normalization or modality weighting. The resulting system, which is completely automated, performs with 92.6% precision at 90% recall on a dataset of 16 real meetings spanning a total of 13 hours.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.