2018
DOI: 10.1186/s13636-018-0140-x
|View full text |Cite
|
Sign up to set email alerts
|

The use of long-term features for GMM- and i-vector-based speaker diarization systems

Abstract: Several factors contribute to the performance of speaker diarization systems. For instance, the appropriate selection of speech features is one of the key aspects that affect speaker diarization systems. The other factors include the techniques employed to perform both segmentation and clustering. While the static mel frequency cepstral coefficients are the most widely used features in speech-related tasks including speaker diarization, several studies have shown the benefits of augmenting regular speech featu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 14 publications
(15 citation statements)
references
References 26 publications
0
15
0
Order By: Relevance
“…Given the achieved progress in separating the main patterns of the radio broadcasted streams (Voice, Phone, Music and main speakers, telephone conversations and music interferences (VPM) scheme) [1-3], the proposed framework investigates language detection mechanisms, targeting the segmentation of spoken content at a linguistic level. Focused research was conducted in speaker diarization/verification problems or sentiment analysis via Gaussian Mixture Modeling with cepstral properties [8,17]. Moreover, innovative Deep Learning and Convolutional Neural Networks architectures are deployed in this direction [4,5,35] with 2-D input features [7,15].…”
Section: Problem Definition and Motivationmentioning
confidence: 99%
See 4 more Smart Citations
“…Given the achieved progress in separating the main patterns of the radio broadcasted streams (Voice, Phone, Music and main speakers, telephone conversations and music interferences (VPM) scheme) [1-3], the proposed framework investigates language detection mechanisms, targeting the segmentation of spoken content at a linguistic level. Focused research was conducted in speaker diarization/verification problems or sentiment analysis via Gaussian Mixture Modeling with cepstral properties [8,17]. Moreover, innovative Deep Learning and Convolutional Neural Networks architectures are deployed in this direction [4,5,35] with 2-D input features [7,15].…”
Section: Problem Definition and Motivationmentioning
confidence: 99%
“…The audio signals were formatted (transcoded) to PCM (Pulse-Code Modulation) Wav files (16-bit depth, 44,100 Hz sample rate). At the same time, the stereo property was discarded, since it could serve only for the music/genre discrimination and not for voice (and language) recognition, as it was thoroughly studied in [16][17][18][19][20].…”
Section: Data Collection-content Preprocessingmentioning
confidence: 99%
See 3 more Smart Citations