Recognition of Creaky Voice from Emergency Calls

Tavi, Lauri; Alumäe, Tanel; Werner, Stefan

doi:10.21437/interspeech.2019-1253

Cited by 8 publications

(10 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Some works analyze the difference in feature variations due to disguise for different genders [5,7]. Tavi et al [17] suggests that the effect of speaker's sex on creakiness should be treated carefully. Gonzá lez Hautamä ki et al [31] did an extensive study of how certain features are affected in male and female speakers differently in three voice conditions; modal, intended old and intended child.…”

Section: Gender and Disguise Type Impactmentioning

confidence: 99%

“…Identifying whether a given test speech is disguised or original is the first step in ASR from disguised voices. In some works, deep features and neural network classifiers are used for this classification [15][16][17][18]. This classification is done in literature using both prosodic and cepstral features [16,[18][19][20][21].…”

Section: Introductionmentioning

confidence: 99%

“…This classification is done in literature using both prosodic and cepstral features [16,[18][19][20][21]. Specific types of disguises are considered in most of the works like pitch disguised voices [16,[18][19][20], creaky voices [9,17], mimicked voices [15,21] etc. Some of the related works available in literature analyzed various voice features affected by disguise and also robust to disguise.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Classification of Pitch and Gender of Speakers for Forensic Speaker Recognition from Disguised Voices Using Novel Features Learned by Deep Convolutional Neural Networks

Nair¹,

Savithri²

2021

View full text Add to dashboard Cite

Voice disguise is a major concern in forensic automatic speaker recognition (FASR). Classifying the type of disguise is very important for speaker recognition. Pitch disguise is a very common type of disguise that criminals try to attempt. Among the different types of disguises, high pitch and low pitch voices show more distortion. The features that are robust for high pitch and low pitch voices are different. Moreover, the effect of disguise on male and female voices are also different. In this work, we classified high pitch and low pitch disguised voices for male and female voices using a novel set of features. We arranged Mel frequency cepstral coefficients (MFCC), ΔMFCC, and ΔΔMFCC features as three-dimensional features, and these are given as the RGB equivalent spectrogram inputs to pretrained AlexNet deep convolutional neural network (DCNN). We fused the AlexNet output features with corresponding MFCC correlation features. These fused features are the proposed novel features for disguise classification. Classification using neural network (NN) and support vector machine (SVM) classifiers are performed. Simulation results show that classification with SVM classifier using these novel features gives improved accuracy of 98.89% compared to 95.99% accuracy obtained by using DCNN output features using traditional spectrogram inputs.

show abstract

Section: Gender and Disguise Type Impactmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Classification of Pitch and Gender of Speakers for Forensic Speaker Recognition from Disguised Voices Using Novel Features Learned by Deep Convolutional Neural Networks

Nair¹,

Savithri²

2021

View full text Add to dashboard Cite

show abstract

“…Kolmannessa tutkimuksessa (Tavi, Alumäe & Werner 2019) kehitettiin automaattinen narinaäänenlaadun tunnistin hätäpuheluille. Tunnistin perustui syväoppimismenetelmiin, jotka ovat koneoppimisen eräs kehittynyt alalaji.…”

Section: Keskeisimmät Tutkimustuloksetunclassified

Hädänalaisen puheen prosodia

Tavi

2020

Virittäjä

Self Cite

View full text Add to dashboard Cite

Lauri Tavin fonetiikan alaan kuuluva väitöskirja tarkastettiin perjantaina 26. kesäkuuta 2020 Itä-Suomen yliopistossa. Vastaväittäjänä toimi emeritaprofessori Anna-Maija Korpijaakko-Huuhka Tampereen yliopistosta ja kustoksena dosentti Stefan Werner. Lauri Tavi: Prosodic cues of speech under stress: Phonetic exploration of Finnish emergency calls. Dissertations in Education, Humanities, and Theology; 154. Joensuu: Itä-Suomen yliopisto, 2020. Väitöskirja on luettavissa osoitteessa https://epublications.uef.fi/pub/urn_isbn_978-952-61-3403-1/urn_isbn_978-952-61-3403-1.pdf

show abstract

“…Por otro lado, el avance de la tecnología ha permitido reconocer el nivel de eficacia que estos sistemas de reconocimiento de voz obtienen en casos específicos. Así, los sistemas RAV basados en redes neuronales convolucionales (convolutional neural network-ConvNet/CNN) que se generan especialmente durante las llamadas de emergencia y cuyo fin es el de detectar el estado emocional y verificar la autenticidad intencional del hablante (Tavi et al, 2019).…”

unclassified

Reconocimiento del habla con acento español basado en un modelo acústico

et al. 2022

View full text Add to dashboard Cite

El objetivo del artículo fue generar un modelo reconocimiento automático de voz (RAV) basado en la traducción de la voz humana a texto, siendo considerado una de las ramas de la inteligencia artificial. El análisis de voz permite identificar información sobre la acústica, fonética, sintáctica, semántica de las palabras, entre otros elementos que pueden identificar ambigüedad en términos, errores de pronunciación, sintáctica similar pero semántica diferente, que representan características propias del lenguaje humano. El modelo se centró en el análisis acústico de las palabras, proponiendo la generación de una metodología para reconocimiento acústico a partir de transcripciones del habla de audios que contienen voz humana y se usó la tasa de error por palabra para identificar la precisión del modelo. Los audios son llamadas de emergencia registrados por el Servicio Integrado de Seguridad ECU911. El modelo fue entrenado con la herramienta CMUSphinx para idioma español sin conexión a internet. Los resultados mostraron que la tasa de error por palabra varía en relación a la cantidad de audios; es decir a mayor cantidad de audios menor cantidad de palabras erróneas y mayor exactitud del modelo. La investigación concluyó haciendo énfasis en la duración de cada audio como variable que afecta la precisión del modelo.

show abstract

Recognition of Creaky Voice from Emergency Calls

Cited by 8 publications

References 17 publications

Classification of Pitch and Gender of Speakers for Forensic Speaker Recognition from Disguised Voices Using Novel Features Learned by Deep Convolutional Neural Networks

Classification of Pitch and Gender of Speakers for Forensic Speaker Recognition from Disguised Voices Using Novel Features Learned by Deep Convolutional Neural Networks

Hädänalaisen puheen prosodia

Reconocimiento del habla con acento español basado en un modelo acústico

Contact Info

Product

Resources

About