2019
DOI: 10.1371/journal.pone.0205943
|View full text |Cite
|
Sign up to set email alerts
|

CLEESE: An open-source audio-transformation toolbox for data-driven experiments in speech and music cognition

Abstract: Over the past few years, the field of visual social cognition and face processing has been dramatically impacted by a series of data-driven studies employing computer-graphics tools to synthesize arbitrary meaningful facial expressions. In the auditory modality, reverse correlation is traditionally used to characterize sensory processing at the level of spectral or spectro-temporal stimulus properties, but not higher-level cognitive processing of e.g. words, sentences or music, by lack of tools able to manipul… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
32
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
2

Relationship

3
6

Authors

Journals

citations
Cited by 23 publications
(33 citation statements)
references
References 57 publications
1
32
0
Order By: Relevance
“…SIR can also be applied to study any parametrizable sensory stimulus spaces (e.g. auditory [24,25,34,55], as well as other cognitive, social and affective tasks; for reviews see [23,56,57]) and to study the information processing mechanisms of both brain and in silico architectures. We propose, therefore, that the time is ripe to exploit the full capabilities of modern brain-imaging technologies and to embrace richer designs that exploit the trial-by-trial trivariate 〈stimulus information; brain; behaviour〉.…”
Section: Resultsmentioning
confidence: 99%
“…SIR can also be applied to study any parametrizable sensory stimulus spaces (e.g. auditory [24,25,34,55], as well as other cognitive, social and affective tasks; for reviews see [23,56,57]) and to study the information processing mechanisms of both brain and in silico architectures. We propose, therefore, that the time is ripe to exploit the full capabilities of modern brain-imaging technologies and to embrace richer designs that exploit the trial-by-trial trivariate 〈stimulus information; brain; behaviour〉.…”
Section: Resultsmentioning
confidence: 99%
“…aroused voices compared to calm or sad voices, the effect of the modulation is carried linearly through the vocal pathway and can be simulated with a simple scalar multiplication of the recording's root mean square (RMS) intensity (see e.g., Ilie & Thompson, 2006) or, for arbitrary intensity profiles, a piecewise linear function as implemented for example in the reversecorrelation toolbox CLEESE 8 (Burred, Ponsot, Goupil, Liuni, & Aucouturier, 2019).…”
Section: Glottal Source Transformationsmentioning
confidence: 99%
“…Simple algorithms, as used for example in altered auditory feedback research (Hain, Burnett, Larson, & Kiran, 2001) and implemented in the DAVID toolbox 10 (Rachman et al, 2018), are based on resampling or multiple delay lines (a technique that introduces a small delay to an audio signal in order to play it faster/slower, thus raising/lowering its pitch; Dattorro, 1997) and may alter vocal tract filtering or formants unrealistically beyond small parametric changes. State-of-theart techniques that allow separating source and filter information to avoid such artefacts are based on reconstructions of the signal's short-time Fourier transform (STFT) at nonuniform rates, such as the pitch synchronous overlap and add (PSOLA) method as implemented for example in PRAAT (Boersma & Weenink, 2002); the phase-vocoder method (Moulines & Laroche, 1995) as implemented for example in CLEESE (Burred et al, 2019); or pitch-adaptive analyses techniques such as the adaptive interpolation of weighted spectrum method as implemented in STRAIGHT 11 (Kawahara, 1997). These transformation methods not only allow raising or lowering the mean pitch of a recording, which may correspond to a baseline change of valence (see Figure 3A; see e.g., Ilie & Thompson, 2006), but can also manipulate the difference between the instantaneous and mean F0 to exaggerate or lessen variations, as seen for example in fearful versus sad vocalizations (see Figure 3B; see e.g., Pell & Kotz, 2011); create parametric F0 contours such as vibrato in anxious voices ( Figure 3C; see e.g., Bachorowski & Owren, 1995), or local intonations at the start or end of an utterance, as in surprised or assertive speech (see Figure 3D; see e.g., Jiang & Pell, 2017).…”
Section: Not All Glottal Source Changes Are Easily Simulated With Voimentioning
confidence: 99%
“…SIR can also be applied to study any parametrizable sensory stimulus spaces (e.g. auditory 20,21,53,54 , as well as other cognitive, social and affective tasks (for reviews see 19,55,55,56 , and to study the information processing mechanisms of both brain and in silicon architectures.…”
Section: Resultsmentioning
confidence: 99%