Real-time Magnetic Resonance Imaging (rtMRI) was used to examine mechanisms of sound production in five beatboxers. rtMRI was found to be an effective tool with which to study the articulatory dynamics of this form of human vocal production; it provides a dynamic view of the entire midsagittal vocal tract and at a frame rate (83 fps) sufficient to observe the movement and coordination of critical articulators. The artists' repertoires included percussion elements generated using a wide range of articulatory and airstream mechanisms. Analysis of three common beatboxing sounds resulted in the finding that advanced beatboxers produce stronger ejectives and have greater control over different airstreams than novice beatboxers, to enhance the quality of their sounds. No difference in production mechanisms between males and females was observed. These data offer insights into the ways in which articulators can be trained and used to achieve specific acoustic goals.
Previous research suggests that beatboxers only use sounds that exist in the world's languages. This paper provides evidence to the contrary, showing that beatboxers use non-linguistic articulations and airstream mechanisms to produce many sound effects that have not been attested in any language. An analysis of real-time magnetic resonance videos of beatboxing reveals that beatboxers produce non-linguistic articulations such as ingressive retroflex trills and ingressive lateral bilabial trills. In addition, beatboxers can use both lingual egressive and pulmonic ingressive airstreams, neither of which have been reported in any language. The results of this study affect our understanding of the limits of the human vocal tract, and address questions about the mental units that encode music and phonological grammar.
Film music varies tremendously across genre in order to bring about different responses in an audience. For instance, composers may evoke passion in a romantic scene with lush string passages or inspire fear throughout horror films with inharmonious drones. This study investigates such phenomena through a quantitative evaluation of music that is associated with different film genres. We construct supervised neural network models with various pooling mechanisms to predict a film’s genre from its soundtrack. We use these models to compare handcrafted music information retrieval (MIR) features against VGGish audio embedding features, finding similar performance with the top-performing architectures. We examine the best-performing MIR feature model through permutation feature importance (PFI), determining that mel-frequency cepstral coefficient (MFCC) and tonal features are most indicative of musical differences between genres. We investigate the interaction between musical and visual features with a cross-modal analysis, and do not find compelling evidence that music characteristic of a certain genre implies low-level visual features associated with that genre. Furthermore, we provide software code to replicate this study at https://github.com/usc-sail/mica-music-in-media. This work adds to our understanding of music’s use in multi-modal contexts and offers the potential for future inquiry into human affective experiences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.