2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6639171
|View full text |Cite
|
Sign up to set email alerts
|

Improved overlap speech diarization of meeting recordings using long-term conversational features

Abstract: Overlapping speech is a source of significant errors in speaker diarization of spontaneous meeting recordings. Recent works on speaker diarization have attempted to solve the problem of overlap detection using classifiers trained on acoustic and spatial features. This paper proposes a method to improve the short-term spectral feature based overlap detector by incorporating information from long-term conversational features in the form of speaker change statistics. The statistics are obtained at segment level(a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
11
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 19 publications
(11 citation statements)
references
References 9 publications
0
11
0
Order By: Relevance
“…Various methods have been proposed to solve the problem of overlapping speaker segmentation including hidden markov model (HMM)-based methods that use Mel-frequency cepstral coefficients (MFCCs), linear predictive coding (LPC) and root mean square (RMS) energy features [12]. Methods that use long-term conversational features [13] have also been put forward along with multimodal techniques that use multiple microphone and camera systems [14]. More recently, deep learning approaches have become increasingly prevalent [15]- [18] which often require large amounts of labelled training data.…”
Section: Introductionmentioning
confidence: 99%
“…Various methods have been proposed to solve the problem of overlapping speaker segmentation including hidden markov model (HMM)-based methods that use Mel-frequency cepstral coefficients (MFCCs), linear predictive coding (LPC) and root mean square (RMS) energy features [12]. Methods that use long-term conversational features [13] have also been put forward along with multimodal techniques that use multiple microphone and camera systems [14]. More recently, deep learning approaches have become increasingly prevalent [15]- [18] which often require large amounts of labelled training data.…”
Section: Introductionmentioning
confidence: 99%
“…Existing approaches to the segmentation of overlapping speech include [10] where a HMM-based method, that used mel-frequency cepstral coefficients (MFCCs), linear predictive coding (LPC) and root mean square (RMS) energy features, was proposed. It has also been shown in [11] that the segmentation performance can be improved if long-term conversational features are utilised. Other methods include multimodal techniques that use multiple microphone and camera systems [12].…”
Section: Introductionmentioning
confidence: 99%
“…An approach presented in [15] uses the output of a voice activity detection system and the silence distribution to detect overlap. This work was extended by exploiting long-term conversational features for overlap detection [16]. Neural networks could improve overlap detection by analysing the context.…”
Section: Introductionmentioning
confidence: 99%