Mapping Across Feature Spaces in Forensic Voice Comparison: The Contribution of Auditory-Based Voice Quality to (Semi-)Automatic System Testing

Hughes, Vincent; Harrison, Philip; Foulkes, Paul; French, Peter; Kavanagh, Colleen; Segundo, Eugenia San

doi:10.21437/interspeech.2017-1508

Cited by 12 publications

(10 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…nasality, creakiness) increased, decreased or remained the same when comparing high quality recordings and telephone-degraded recordings. We have investigated possible correlations between different long-term vocal tract output measures -including supralaryngeal settings -with the aim of finding how VQ analysis can complement longterm formant distributions (LTFDs) and Mel frequency cepstral coefficients calculated across entire speech samples (MFCCs; French et al 2015, Hughes et al 2017.…”

Section: The Vocal Profile Analysis: Applications Issues and Challengesmentioning

confidence: 99%

The use of the Vocal Profile Analysis for speaker characterization: Methodological proposals

Segundo¹,

Foulkes²,

French³

et al. 2018

Journal of the International Phonetic Association

Self Cite

View full text Add to dashboard Cite

Among phoneticians, the Vocal Profile Analysis (VPA) is one of the most widely used methods for the componential assessment of voice quality. Whether the ultimate goal of the VPA evaluation is the comparative description of languages or the characterization of an individual speaker, the VPA protocol shows great potential for different research areas of speech communication. However, its use is not without practical difficulties. Despite these, methodological studies aimed at explaining where, when and why issues arise during the perceptual assessment process are rare. In this paper we describe the methodological stages through which three analysts evaluated the voices of 99 Standard Southern British English male speakers, rated their voices using the VPA scheme, discussed inter-rater disagreements, and eventually produced an agreed version of VPA scores. These scores were then used to assess correlations between settings. We show that it is possible to reach a good degree of inter-rater agreement, provided that several calibration and training sessions are conducted. We further conclude that the perceptual assessment of voice quality using the VPA scheme is an essential tool in fields such as forensic phonetics but, foremost, that it can be adapted and modified to a range of research areas, and not necessarily limited to the evaluation of pathological voices in clinical settings.

show abstract

Section: The Vocal Profile Analysis: Applications Issues and Challengesmentioning

confidence: 99%

The use of the Vocal Profile Analysis for speaker characterization: Methodological proposals

Segundo¹,

Foulkes²,

French³

et al. 2018

Journal of the International Phonetic Association

Self Cite

View full text Add to dashboard Cite

show abstract

“…Recordings were drawn from the DyViS corpus of young male standard southern British English speakers [12]. Of the 100 available speakers, 97 were used, based on prior testing outlined in [13]. SASR testing was carried out under four different conditions according to the technical quality of the offender sample.…”

Section: 1! Materialsmentioning

confidence: 99%

“…The suspect recording (Task1) and the four versions of the offender recording (Task2) were prepared for analysis in the same way as described in ¤2.2 of [13]. This involved manual editing of recordings to remove non-speech sounds and overlapping speech, removal of sections containing clipping, voice activity detection to remove silences of greater than 100ms (using the vadsohn function in the VOICEBOX toolkit [15]), and segmentation of the signal into consonants and vowels using stkCV [16].…”

Section: 2! Preparation Of Recordingsmentioning

confidence: 99%

“…Delta coefficients were also appended to the feature vector for each frame. Bandwidths and deltas were included in this SASR system as they have generally been shown to improve performance [6,13].…”

Section: 3! Formant Extractionmentioning

confidence: 99%

“…Mean formant values for each speaker based on pooled data for their HQ Task1 and Task2 samples were also used as independent variables Ð we predicted that low mean F1 would result in higher LLR variability as it is more susceptible to the Ôtelephone effectÕ which artificially alters F1 values [21,22]. Auditory-based judgments of supralaryngeal and laryngeal voice quality (using data described in [13]) were also used as independent variables. The best model fit was identified using model comparison based on ANOVAs.…”

Section: 5! Analysing Individualsmentioning

confidence: 99%

See 2 more Smart Citations

The Individual and the System: Assessing the Stability of the Output of a Semi-automatic Forensic Voice Comparison System

Hughes¹,

Harrison²,

Foulkes³

et al. 2018

Interspeech 2018

Self Cite

View full text Add to dashboard Cite

Semi-automatic systems based on traditional linguisticphonetic features are increasingly being used for forensic voice comparison (FVC) casework. In this paper, we examine the stability of the output of a semi-automatic system, based on the long-term formant distributions (LTFDs) of F1, F2, and F3, as the channel quality of the input recordings decreases. Crossvalidated, calibrated GMM-UBM log likelihood-ratios (LLRs) were computed for 97 Standard Southern British English speakers under four conditions. In each condition the same speech material was used, but the technical properties of the recordings changed (high quality studio recording, landline telephone recording, high bit-rate GSM mobile telephone recording and low bit-rate GSM mobile telephone recording). Equal error rate (EER) and the log LR cost function (C llr) were compared across conditions. System validity was found to decrease with poorer technical quality, with the largest differences in EER (21.66%) and C llr (0.46) found between the studio and the low bit-rate GSM conditions. However, importantly, performance for individual speakers was affected differently by channel quality. Speakers that produced stronger evidence overall were found to be more variable. Mean F3 was also found to be a predictor of LLR variability, however no effects were found based on speakersÕ voice quality profiles.

show abstract

Forensic Voice Comparison Approaches for Low‐Resource Languages

Kruthika,

Nagavi,

Mahesha

2024

Automatic Speech Recognition and Translation for Low Resource Languages

View full text Add to dashboard Cite

Mapping Across Feature Spaces in Forensic Voice Comparison: The Contribution of Auditory-Based Voice Quality to (Semi-)Automatic System Testing

Cited by 12 publications

References 20 publications

The use of the Vocal Profile Analysis for speaker characterization: Methodological proposals

The use of the Vocal Profile Analysis for speaker characterization: Methodological proposals

The Individual and the System: Assessing the Stability of the Output of a Semi-automatic Forensic Voice Comparison System

Forensic Voice Comparison Approaches for Low‐Resource Languages

Contact Info

Product

Resources

About