2018
DOI: 10.1101/436477
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CLEESE: An open-source audio-transformation toolbox for data-driven experiments in speech and music cognition

Abstract: Over the past few years, the field of visual social cognition and face processing has been 1 dramatically impacted by a series of data-driven studies employing computer-graphics 2 tools to synthesize arbitrary meaningful facial expressions. In the auditory modality, 3 reverse correlation is traditionally used to characterize sensory processing at the level of 4 spectral or spectro-temporal stimulus properties, but not higher-level cognitive 5 processing of e.g. words, sentences or music, by lack of tools able … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
12
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2

Relationship

3
2

Authors

Journals

citations
Cited by 6 publications
(12 citation statements)
references
References 52 publications
0
12
0
Order By: Relevance
“…Changes in the subglottal pressure due to the contraction of thoracic and abdominal muscles, which are controlled from the anterior horn of the spinal cord, primarily lead to modulations of voice intensity. At moderate intensities, such as in happy, aroused voices compared to calm or sad voices, the effect of the modulation is carried linearly through the vocal pathway and can be simulated with a simple scalar multiplication of the recording’s root mean square (RMS) intensity (see e.g., Ilie & Thompson, 2006) or, for arbitrary intensity profiles, a piece-wise linear function as implemented for example in the reverse-correlation toolbox CLEESE 8 (Burred, Ponsot, Goupil, Liuni, & Aucouturier, 2019).…”
Section: Voice Transformations Along the Vocal Production Pathwaymentioning
confidence: 99%
See 4 more Smart Citations
“…Changes in the subglottal pressure due to the contraction of thoracic and abdominal muscles, which are controlled from the anterior horn of the spinal cord, primarily lead to modulations of voice intensity. At moderate intensities, such as in happy, aroused voices compared to calm or sad voices, the effect of the modulation is carried linearly through the vocal pathway and can be simulated with a simple scalar multiplication of the recording’s root mean square (RMS) intensity (see e.g., Ilie & Thompson, 2006) or, for arbitrary intensity profiles, a piece-wise linear function as implemented for example in the reverse-correlation toolbox CLEESE 8 (Burred, Ponsot, Goupil, Liuni, & Aucouturier, 2019).…”
Section: Voice Transformations Along the Vocal Production Pathwaymentioning
confidence: 99%
“…Simple algorithms, as used for example in altered auditory feedback research (Hain, Burnett, Larson, & Kiran, 2001) and implemented in the DAVID toolbox 10 (Rachman et al, 2018), are based on resampling or multiple delay lines (a technique that introduces a small delay to an audio signal in order to play it faster/slower, thus raising/lowering its pitch; Dattorro, 1997) and may alter vocal tract filtering or formants unrealistically beyond small parametric changes. State-of-the-art techniques that allow separating source and filter information to avoid such artefacts are based on reconstructions of the signal’s short-time Fourier transform (STFT) at nonuniform rates, such as the pitch synchronous overlap and add (PSOLA) method as implemented for example in PRAAT (Boersma & Weenink, 2002); the phase-vocoder method (Moulines & Laroche, 1995) as implemented for example in CLEESE (Burred et al, 2019); or pitch-adaptive analyses techniques such as the adaptive interpolation of weighted spectrum method as implemented in STRAIGHT 11 (Kawahara, 1997). These transformation methods not only allow raising or lowering the mean pitch of a recording, which may correspond to a baseline change of valence (see Figure 3A; see e.g., Ilie & Thompson, 2006), but can also manipulate the difference between the instantaneous and mean F0 to exaggerate or lessen variations, as seen for example in fearful versus sad vocalizations (see Figure 3B; see e.g., Pell & Kotz, 2011); create parametric F0 contours such as vibrato in anxious voices (Figure 3C; see e.g., Bachorowski & Owren, 1995), or local intonations at the start or end of an utterance, as in surprised or assertive speech (see Figure 3D; see e.g., Jiang & Pell, 2017).…”
Section: Voice Transformations Along the Vocal Production Pathwaymentioning
confidence: 99%
See 3 more Smart Citations