Interspeech 2015 2015
DOI: 10.21437/interspeech.2015-110
|View full text |Cite
|
Sign up to set email alerts
|

Using voice-quality measurements with prosodic and spectral features for speaker diarization

Abstract: Jitter and shimmer voice-quality measurements have been successfully used to detect voice pathologies and classify different speaking styles. In this paper, we investigate the usefulness of jitter and shimmer voice measurements in the framework of the speaker diarization task. The combination of jitter and shimmer voice-quality features with the long-term prosodic and shortterm spectral features is explored in a subset of the Augmented Multi-party Interaction (AMI) corpus, a multi-party and spontaneous speech … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 18 publications
0
8
0
Order By: Relevance
“…Moreover, using jitter and shimmer measurements together with cepstral ones improves the classification accuracy of different speaking styles [17]. Such voice-quality features are also important in speaker diarization [20], and they can be used to characterize different types of voices such as breathy, tense, harsh, whispery and creaky [16].…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, using jitter and shimmer measurements together with cepstral ones improves the classification accuracy of different speaking styles [17]. Such voice-quality features are also important in speaker diarization [20], and they can be used to characterize different types of voices such as breathy, tense, harsh, whispery and creaky [16].…”
Section: Introductionmentioning
confidence: 99%
“…Short-term features are extracted from a single speech frame, while long-term features are extracted from portions of speech longer than one frame. Since long-term features provide discriminative power, fusion of short-term spectral features with long-term features has been applied on different speech applications [3]- [5]. Longterm speech features are also robust to channel variation since temporal patterns do not change with the change of acoustic conditions [6].…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, using jitter and shimmer measurements together with cepstral ones improves the classification accuracy of different speaking styles [8]. Such voice-quality features are also important in speaker diarization [5], [11]- [13], and they can be used to characterize different types of voices such as breathy, tense, harsh, whispery and creaky [7].…”
Section: Introductionmentioning
confidence: 99%
“…Experimental results show that the best results are obtained by fusing the voice-quality features with the prosodic ones at the feature level, and then fusing them with the cepstral features at the score level. The results of this work has been published in [Woubie et al, 2015].…”
Section: Using Voice-quality Measurements With Prosodic and Spectral ...mentioning
confidence: 99%
“…Since fusion techniques extract multiple information from multiple sources and improve accuracy, they have been successfully used in various tasks including speaker recognition [Farrús et al, 2007] , speaker diarization [Friedland et al, 2009, Zelenák and Hernando, Chapter 5. Proposed Speaker Diarization Systems 2011, Woubie et al, 2015 and multi-biometrics [Nandakumar et al, 2009, Nandakumar et al, 2008.…”
Section: Fusion Techniquesmentioning
confidence: 99%