Interspeech 2009 2009
DOI: 10.21437/interspeech.2009-734
|View full text |Cite
|
Sign up to set email alerts
|

Long term examination of intra-session and inter-session speaker variability

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2010
2010
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 10 publications
0
2
0
Order By: Relevance
“…Simply put, the identity information of the speaker is embedded (primarily) in how speech is spoken, though the speaker's lexical choice can also be incorporated into the analysis and the degree of overlap in the content of speech between questioned and known recordings certainly impact on SR effectiveness [10] [11]. The behavioral component makes speech signals prone to greater variability such that even the same person would not say the same words in the same way every time (this is known in different bodies of literature as "style-shifting" or "intra-speaker variability") [12] . Differences in recording devices and transmission methods only exacerbate a problem already inherent in SR [13] [14].…”
Section: Mismatch Of Conditionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Simply put, the identity information of the speaker is embedded (primarily) in how speech is spoken, though the speaker's lexical choice can also be incorporated into the analysis and the degree of overlap in the content of speech between questioned and known recordings certainly impact on SR effectiveness [10] [11]. The behavioral component makes speech signals prone to greater variability such that even the same person would not say the same words in the same way every time (this is known in different bodies of literature as "style-shifting" or "intra-speaker variability") [12] . Differences in recording devices and transmission methods only exacerbate a problem already inherent in SR [13] [14].…”
Section: Mismatch Of Conditionsmentioning
confidence: 99%
“…Given the behavioral component of speech, some characteristics of an audio sample are prone to variability over the duration of the recording. As an example of this within-session variability [12], a person may speak in a neutral tone at the beginning of a recording but with anger at another moment. In such cases, it may not be possible to locate relevant population data with the same transition in conditions, and therefore, a suggested protocol would be to analyze the neutral and angered parts independently or select only the neutral part for comparison.…”
Section: Mismatch Of Conditionsmentioning
confidence: 99%