The analysis of musical signals to extract audio descriptors that can potentially characterize their timbre has been disparate and often too focused on a particular small set of sounds. The Timbre Toolbox provides a comprehensive set of descriptors that can be useful in perceptual research, as well as in music information retrieval and machine-learning approaches to content-based retrieval in large sound databases. Sound events are first analyzed in terms of various input representations (short-term Fourier transform, harmonic sinusoidal components, an auditory model based on the equivalent rectangular bandwidth concept, the energy envelope). A large number of audio descriptors are then derived from each of these representations to capture temporal, spectral, spectrotemporal, and energetic properties of the sound events. Some descriptors are global, providing a single value for the whole sound event, whereas others are time-varying. Robust descriptive statistics are used to characterize the time-varying descriptors. To examine the information redundancy across audio descriptors, correlational analysis followed by hierarchical clustering is performed. This analysis suggests ten classes of relatively independent audio descriptors, showing that the Timbre Toolbox is a multidimensional instrument for the measurement of the acoustical structure of complex sound signals.
The influence of listener's expertise and sound identification on the categorization of environmental sounds is reported in three studies. In Study 1, the causal uncertainty of 96 sounds was measured by counting the different causes described by 29 participants. In Study 2, 15 experts and 15 nonexperts classified a selection of 60 sounds and indicated the similarities they used. In Study 3, 38 participants indicated their confidence in identifying the sounds. Participants reported using either acoustical similarities or similarities of the causes of the sounds. Experts used acoustical similarity more often than nonexperts, who used the similarity of the cause of the sounds. Sounds with a low causal uncertainty were more often grouped together because of the similarities of the cause, whereas sounds with a high causal uncertainty were grouped together more often because of the acoustical similarities. The same conclusions were reached for identification confidence. This measure allowed the sound classification to be predicted, and is a straightforward method to determine the appropriate description of a sound.
In this article we report on listener categorization of meaningful environmental sounds. A starting point for this study was the phenomenological taxonomy proposed by Gaver (1993b). In the first experimental study, 15 participants classified 60 environmental sounds and indicated the properties shared by the sounds in each class. In a second experimental study, 30 participants classified and described 56 sounds exclusively made by solid objects. The participants were required to concentrate on the actions causing the sounds independent of the sound source. The classifications were analyzed with a specific hierarchical cluster technique that accounted for possible cross-classifications, and the verbalizations were submitted to statistical lexical analyses. The results of the first study highlighted 4 main categories of sounds: solids, liquids, gases, and machines. The results of the second study indicated a distinction between discrete interactions (e.g., impacts) and continuous interactions (e.g., tearing) and suggested that actions and objects were not independent organizational principles. We propose a general structure of environmental sound categorization based on the sounds' temporal patterning, which has practical implications for the automatic classification of environmental sounds.
It is well-established that subjective judgments of perceived urgency of alarm sounds can be affected by acoustic parameters. In this study, the authors investigated an objective measurement, the reaction time (RT), to test the effectiveness of temporal parameters of sounds in the context of warning sounds. Three experiments were performed using a RT paradigm, with two different concurrent visuomotor tracking tasks simulating driving conditions. Experiments 1 and 2 show that RT decreases as interonset interval (IOI) decreases, where IOI is defined as the time elapsed from the onset of one sound pulse to the onset of the next. Experiment 3 shows that temporal irregularity between pulses can capture a listener's attention. These findings lead to concrete recommendations: IOI can be used to modulate warning sound urgency; and temporal irregularity can provoke an arousal effect in listeners. The authors also argue that the RT paradigm provides a useful tool for clarifying some of the factors involved in alarm processing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.