Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1253
|View full text |Cite
|
Sign up to set email alerts
|

Recognition of Creaky Voice from Emergency Calls

Abstract: Although creaky voice, or vocal fry, is widely studied phonation mode, open questions still exist in creak's acoustic characterization and automatic recognition. Many questions are open since creak varies significantly depending on conversational context. In this study, we introduce an exploratory creak recognizer based on convolutional neural network (CNN), which is generated specifically for emergency calls. The study focuses on recognition of creaky voice from authentic emergency calls because creak detecti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
6
0
3

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 17 publications
1
6
0
3
Order By: Relevance
“…Some works analyze the difference in feature variations due to disguise for different genders [5,7]. Tavi et al [17] suggests that the effect of speaker's sex on creakiness should be treated carefully. Gonzá lez Hautamä ki et al [31] did an extensive study of how certain features are affected in male and female speakers differently in three voice conditions; modal, intended old and intended child.…”
Section: Gender and Disguise Type Impactmentioning
confidence: 99%
See 2 more Smart Citations
“…Some works analyze the difference in feature variations due to disguise for different genders [5,7]. Tavi et al [17] suggests that the effect of speaker's sex on creakiness should be treated carefully. Gonzá lez Hautamä ki et al [31] did an extensive study of how certain features are affected in male and female speakers differently in three voice conditions; modal, intended old and intended child.…”
Section: Gender and Disguise Type Impactmentioning
confidence: 99%
“…Identifying whether a given test speech is disguised or original is the first step in ASR from disguised voices. In some works, deep features and neural network classifiers are used for this classification [15][16][17][18]. This classification is done in literature using both prosodic and cepstral features [16,[18][19][20][21].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Kolmannessa tutkimuksessa (Tavi, Alumäe & Werner 2019) kehitettiin automaattinen narinaäänenlaadun tunnistin hätäpuheluille. Tunnistin perustui syväoppimismenetelmiin, jotka ovat koneoppimisen eräs kehittynyt alalaji.…”
Section: Keskeisimmät Tutkimustuloksetunclassified
“…Por otro lado, el avance de la tecnología ha permitido reconocer el nivel de eficacia que estos sistemas de reconocimiento de voz obtienen en casos específicos. Así, los sistemas RAV basados en redes neuronales convolucionales (convolutional neural network-ConvNet/CNN) que se generan especialmente durante las llamadas de emergencia y cuyo fin es el de detectar el estado emocional y verificar la autenticidad intencional del hablante (Tavi et al, 2019).…”
unclassified