In this paper we describe a speaker diarization system that enables localization and identification of all speakers present in a conversation or meeting. We propose a novel systematic approach to tackle several long-standing challenges in speaker diarization tasks: (1) to segment and separate overlapping speech from two speakers;(2) to estimate the number of speakers when participants may enter or leave the conversation at any time; (3) to provide accurate speaker identification on short text-independent utterances; (4) to track down speakers movement during the conversation; (5) to detect speaker change incidence real-time. First, a differential directional microphone array-based approach is exploited to capture the target speakers' voice in far-field adverse environment. Second, an online speaker-location joint clustering approach is proposed to keep track of speaker location. Third, an instant speaker number detector is developed to trigger the mechanism that separates overlapped speech. The results suggest that our system effectively incorporates spatial information and achieves significant gains.
Performance degradation caused by noise has been a long-standing challenge for speaker verification. Previous methods usually involve applying a denoising transformation to speaker embeddings or enhancing input features. Nevertheless, these methods are lossy and inefficient for speaker embedding. In this paper, we propose contextaware masking (CAM), a novel method to extract robust speaker embedding. CAM enables the speaker embedding network to "focus" on the speaker of interest and "blur" unrelated noise. The threshold of masking is dynamically controlled by an auxiliary context embedding that captures speaker and noise characteristics. Moreover, models adopting CAM can be trained in an end-to-end manner without using synthesized noisy-clean speech pairs. Our results show that CAM improves speaker verification performance in the wild by a large margin, compared to the baselines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.