Long term examination of intra-session and inter-session speaker variability

Lawson, Aaron; Stauffer, Allen; Smolenski, Brett Y.; Pokines, B.; Leonard, M.R.; Cupples, Edward J.

doi:10.21437/interspeech.2009-734

Cited by 7 publications

(2 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Simply put, the identity information of the speaker is embedded (primarily) in how speech is spoken, though the speaker's lexical choice can also be incorporated into the analysis and the degree of overlap in the content of speech between questioned and known recordings certainly impact on SR effectiveness [10] [11]. The behavioral component makes speech signals prone to greater variability such that even the same person would not say the same words in the same way every time (this is known in different bodies of literature as "style-shifting" or "intra-speaker variability") [12] . Differences in recording devices and transmission methods only exacerbate a problem already inherent in SR [13] [14].…”

Section: Mismatch Of Conditionsmentioning

confidence: 99%

“…Given the behavioral component of speech, some characteristics of an audio sample are prone to variability over the duration of the recording. As an example of this within-session variability [12], a person may speak in a neutral tone at the beginning of a recording but with anger at another moment. In such cases, it may not be possible to locate relevant population data with the same transition in conditions, and therefore, a suggested protocol would be to analyze the neutral and angered parts independently or select only the neutral part for comparison.…”

Section: Mismatch Of Conditionsmentioning

confidence: 99%

See 1 more Smart Citation

Issues in Data Processing and Relevant Population Selection

2022

View full text Add to dashboard Cite

In Forensic Automatic Speaker Recognition (FASR), forensic examiners typically compare audio recordings of a speaker whose identity is in question with recordings of known speakers to assist investigators and triers of fact in a legal proceeding. The performance of automated speaker recognition (SR) systems used for this purpose depends largely on the characteristics of the speech samples being compared. Examiners must understand the requirements of specific systems in use as well as the audio characteristics that impact system performance. Mismatch conditions between the known and questioned data samples are of particular importance, but the need for, and impact of, audio pre-processing must also be understood. The data selected for use in a relevant population can also be critical to the performance of the system. This document describes issues that arise in the processing of case data and in the selections of a relevant population for purposes of conducting an examination using a human supervised automatic speaker recognition approach in a forensic context. The document is intended to comply with the Organization of Scientific Area Committees (OSAC) for Forensic Science Technical Guidance Document.

show abstract

Section: Mismatch Of Conditionsmentioning

confidence: 99%

Section: Mismatch Of Conditionsmentioning

confidence: 99%

Issues in Data Processing and Relevant Population Selection

2022

View full text Add to dashboard Cite

show abstract

Effects of Long-Term Ageing on Speaker Verification

Kelly

Harte

2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. The changes that occur in the human voice due to ageing have been well documented. The impact of these changes on speaker verification is less clear. In this work, we examine the effect of long-term vocal ageing on a speaker verification system. On a cohort of 13 adult speakers, using a conventional GMM-UBM system, we carry out longitudinal testing of each speaker across a time span of 30-40 years. We uncover a progressive degradation in verification score as the time span between the training and test material increases. The addition of temporal information to the features causes the rate of degradation to increase. No significant difference was found between MFCC and PLP features. Subsequent experiments show that the effect of short-term ageing (<5 years) is not significant compared with normal inter-session variability. Above this time span however, ageing has a detrimental effect on verification. Finally, we show that the age of the speaker at the time of training influences the rate at which the verification scores degrade. Our results suggest that the verification score drop-off accelerates for speakers over the age of 60. The results presented are the first of their kind to quantify the effect of long-term vocal ageing on speaker verification.

show abstract