Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-61
|View full text |Cite
|
Sign up to set email alerts
|

Voice Comparison and Rhythm: Behavioral Differences between Target and Non-target Comparisons

Abstract: It is common to see voice recordings being presented as a forensic trace in court. Generally, a forensic expert is asked to analyze both suspect and criminal's voice samples in order to indicate whether the evidence supports the prosecution (samespeaker) or defence (different-speakers) hypotheses. This process is known as Forensic Voice Comparison (FVC). Since the emergence of the DNA typing model, the likelihood-ratio (LR) framework has become the golden standard in forensic sciences. The LR not only supports… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

1
1
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 24 publications
1
1
0
Order By: Relevance
“…Compared with x-vec, x-vec+asoftmax achieves a 4.6% EER reduction at the cost of target comparison accuracy. At the meantime, it is worth noting that x-vec+asoftmax+rhythm have significantly improved the target comparison, and it does not affect the accuracy of the non-target comparison, which is consistent with the conclusion in [19]. Besides, multi-task knowledge distillation makes the student-64-TS network have better performance in both target and non-target comparisons, with a 7.1% EER reduction.…”
Section: Effect On Intra-and Inter-speaker Verificationsupporting
confidence: 83%
“…Compared with x-vec, x-vec+asoftmax achieves a 4.6% EER reduction at the cost of target comparison accuracy. At the meantime, it is worth noting that x-vec+asoftmax+rhythm have significantly improved the target comparison, and it does not affect the accuracy of the non-target comparison, which is consistent with the conclusion in [19]. Besides, multi-task knowledge distillation makes the student-64-TS network have better performance in both target and non-target comparisons, with a 7.1% EER reduction.…”
Section: Effect On Intra-and Inter-speaker Verificationsupporting
confidence: 83%
“…Second, change in the short-term spectral envelope (as used by the ASV systems) impacts formants, and we therefore expect them to form reasonable predictors of the LLR score. Concerning prosody, there is a vast body of literature ranging from frame-level F0 characterization (Mary and Yegnanarayana, 2008), stylized (Shriberg et al, 2005;Adami, 2007) and polynomially modeled F0 contours (Dehak et al, 2007), along with energy and timing-or rhythm-related features (Dellwo et al, 2012;Ajili et al, 2018). A number of studies have addressed the impact of such parameters in a forensic context (Leemann et al, 2014;Moez et al, 2016).…”
Section: Predictor Variables: Acoustic and Prosodic Featuresmentioning
confidence: 99%