2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6639228
|View full text |Cite
|
Sign up to set email alerts
|

Speaker identification from shouted speech: Analysis and compensation

Abstract: Text-independent speaker identification is studied using neutral and shouted speech in Finnish to analyze the effect of vocal mode mismatch between training and test utterances. Standard mel-frequency cepstral coefficient (MFCC) features with Gaussian mixture model (GMM) recognizer are used for speaker identification. The results indicate that speaker identification accuracy reduces from perfect (100 %) to 8.71 % under vocal mode mismatch. Because of this dramatic degradation in recognition accuracy, we propos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
12
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 10 publications
0
12
0
Order By: Relevance
“…A number of stress compensation strategies were also formulated, which resulted in significant improvement in ASR (Hansen and Clements, 1989;Hansen, 1996). Later studies considered individual speech modes such as whisper (Fan and Hansen, 2011) and shout (Hanilci et al, 2013) in the context of speaker identification and proposed different compensation strategies to normalize mismatch.…”
Section: Introductionmentioning
confidence: 99%
“…A number of stress compensation strategies were also formulated, which resulted in significant improvement in ASR (Hansen and Clements, 1989;Hansen, 1996). Later studies considered individual speech modes such as whisper (Fan and Hansen, 2011) and shout (Hanilci et al, 2013) in the context of speaker identification and proposed different compensation strategies to normalize mismatch.…”
Section: Introductionmentioning
confidence: 99%
“…Speaker recognition process encompasses three terms identification, verification and diarization. In automatic speaker identification (Figure 1), there is no priori identity claim, the system decides who the person is, or the person is known or unknown [2] [12]. While automatic speaker verification (Figure 2) involves the use of a machine to verify a person's claimed identity from his voice [6,7,12].…”
Section: Speaker Recognitionmentioning
confidence: 99%
“…Another strategy, which has not been widely explored, is to use feature mapping. A recent study showed that such an approach can be helpful in speaker identification scenarios when the input presented is shouted speech [10]. For feature mapping, neural networks and Gaussian mixture models have been widely used in the voice conversion and voiced speech reconstruction (from whispered speech) literature [11,12,13].…”
Section: Introductionmentioning
confidence: 99%