Formant frequency dynamics are relevant to forensic speaker identification since they are determined by the shape and size of a speaker’s vocal tract and the way he or she configures the articulators for speech. This study investigates individual|differences in the formant dynamics of /aI/ produced by five male Australian English speakers, and the effects of changes in speaking rate and prosodic stress on these differences. F1, F2 and F3 frequencies are examined at equidistant time-normalized intervals through /aI/. At each measurement point a degree of speaker individuality is present, and speaker differentiation improves as increasing numbers of measurement points are considered in combination. Patterns of speaker-specific behaviour are generally consistent across different rate-stress conditions. Discriminant analyses based on predictors from all three formants yield classification rates of 88–95%, with nuclear-stressed /aI/ performing best. The findings suggest that further research to develop techniques for characterizing individual speakers using formant dynamics is warranted.
Individual variation in non-fluency behaviour in normally fluent (NF) adults, is investigated. Differences among speakers in the usage of a range of features such as filled and silent pauses, sound prolongations, repetition of phrases, words or part-words, and self-interruptions is explored in the spontaneous speech of 20 male speakers of Standard Southern British English from the DyViS database. The speech analysed is semi-spontaneous, and taken from a simulated police interview task. A taxonomy of fluency features for forensic analysis (TOFFA) was applied to this speech data. The rate of occurrence of each feature per 100 syllables is calculated for each speaker. Results show that individuals vary considerably in the rates of these fluency features occurring in their speech and that between-speaker differences are present in the types of features speakers produce. Implications of the significance of these findings for forensic phonetics are discussed. KEYWORDS: fluency behaviour, disfluency features, TOFFA, individual differences, speaker-specificity HIGHLIGHTS• A detailed taxonomy of disfluency types (TOFFA) is described.• Individual variation in a range of fluency features is observed.• A consistency study demonstrates the challenges of identifying disfluencies.• The significance of disfluencies for forensic speaker comparison is considered. 3 An examination of the literature yields relatively few studies in which the patterns of fluency phenomena of NF speakers are applied to the speech of PWS. There are however studies where definitions of phenomena found in the literature on stuttering are applied to NF speakers. Johnson, Darley and Spriestersbach (1963) using data first presented in Johnson (1961) provides results of an analysis of the fluency features of 50 male and 50 female PWS and 50 male and 50 female NF speakers. The participants were asked to produce three monologues. Roberts, Meltzer and Wilding (2009) replicated the broad outline of this study using 25 NF adult male speakers. In their summary of the findings of Johnson (1961) and a number of other studies of fluency in NF speakers, Roberts et al. (2009) commented upon the diversity of methods of counting disfluencies in earlier studies. For example, the phenomena may be related to the frequency of occurrence per 100 words or 100 syllables but may not define what is counted as a word or a syllable. Roberts et al. counted interjections including filled pauses and utterances like 'well', 'like', 'you know' into this category. They also counted revisions, repetitions, prolongations and the use of 'excessive force in producing a sound' (2009: 425) which they termed a block. They did not count silent pauses. Roberts et al. relate the fluency phenomena to occurrence per 100 syllables which were defined as target (i.e. idealised phonological) syllables only. They report that individuals produce a range of fluency phenomena per 100 syllables yet even speakers with double the rate of other speakers 'still appear to be speaking well' (2009: 424). Roberts et...
Voice identification parades can be unreliable due to the error-prone nature of earwitness responses. UK government guidelines recommend that voice parades should have nine voices, each played for 60 seconds. This makes parades resource-consuming to construct. In the present paper we conducted two experiments to see if voice parade procedures could be simplified. In Experiment 1 (N=271, 135F), we investigated if reducing the duration of the voice samples on a nine-voice parade would negatively affect identification performance using both conventional logistic and signal detection approaches. In Experiment 2 (N=270, 136F), we first explored if the same sample duration conditions used in Experiment 1 would lead to different outcomes if we reduced the parade size to include only six voices. Following this, we pooled the data from both experiments to investigate the influence of target-position effects. The results show that 15s sample durations result in statistically equivalent voice identification performance to the longer 60s sample durations, but that the 30s sample duration suffers in terms of overall signal sensitivity. This pattern of results was replicated using both a nine- and a six-voice parade. Performance on target-absent parades were at chance-levels in both parade sizes and response criteria were mostly liberal. Additionally, unwanted position effects were present. The results provide initial evidence that the sample duration used in a voice parade may be reduced, but we argue that the guidelines recommending a parade with nine-voices should be maintained to provide additional protection for a potentially innocent suspect given the low target-absent accuracy.
The DyViS project (‘Dynamic Variability in Speech: a Forensic Phonetic Study of British English’) at the University of Cambridge has compiled a large-scale database of speech recordings which will be freely available for (non-commercial) research purposes. The database comprises recordings of 100 male speakers of Standard Southern British English, aged 18-25, undertaking four tasks involving different speaking styles: a simulated police interview, a telephone call with an ‘accomplice’, a reading passage, and a set of read sentences. This paper describes the motivation for developing the DyViS database and explains its structure, including the novel techniques developed for eliciting spontaneous yet phonetically controlled speech under simulated forensic conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.