Music is present in every known society, yet varies from place to place. What is universal to the perception of music? We measured a signature of mental representations of rhythm in 923 participants from 39 participant groups in 15 countries across 5 continents, spanning urban societies, indigenous populations, and online participants. Listeners reproduced random ‘‘seed’’ rhythms; their reproductions were fed back as the stimulus (as in the game of “telephone”), such that their biases (the prior) could be estimated from the distribution of reproductions. Every tested group showed a prior with peaks at integer ratio rhythms, suggesting that discrete rhythm “categories” at small integer ratios are universal. The occurrence and relative importance of different integer ratio categories varied across groups, often reflecting local musical systems. However, university students and online participants in non-Western countries tended to resemble Western participants, underrepresenting the variability otherwise evident across cultures. The results suggest the universality of discrete mental representations of music while showing their interaction with culture-specific traditions.
Deep learning models have improved cutting-edge technologies in many research areas, but their black-box structure makes it difficult to understand their inner workings and the rationale behind their predictions. This may lead to unintended effects, such as being susceptible to adversarial attacks or the reinforcement of biases. There is still a lack of research in the audio domain, despite the increasing interest in developing deep learning models that provide explanations of their decisions. To reduce this gap, we propose a novel interpretable deep learning model for automatic sound classification, which explains its predictions based on the similarity of the input to a set of learned prototypes in a latent space. We leverage domain knowledge by designing a frequency-dependent similarity measure and by considering different time-frequency resolutions in the feature space. The proposed model achieves results that are comparable to that of the state-of-the-art methods in three different sound classification tasks involving speech, music, and environmental audio. In addition, we present two automatic methods to prune the proposed model that exploit its interpretability. Our system is open source and it is accompanied by a web application for the manual editing of the model, which allows for a human-in-the-loop debugging approach.
What, if any, similarities and differences between song and speech are consistent across cultures? Both song and speech are found in all known human societies and are argued to share evolutionary roots and cognitive resources, yet no studies have compared similarities and differences between song and speech across languages on a global scale. We will compare sets of matched song/speech recordings produced by our 81 coauthors whose 1st/heritage languages span 23 language families. Each recording set consists of singing, recited lyrics, and spoken description, plus an optional instrumental version of the sung melody to allow us to capture a “musi-linguistic continuum” from instrumental music to naturalistic speech. Our literature review and pilot analysis using five audio recording sets (by speakers of Japanese, English, Farsi, Yoruba, and Marathi) led us to make six predictions for confirmatory analysis comparing song vs. spoken descriptions: three consistent differences and three consistent similarities. For differences, we predict that: 1) songs will have higher pitch than speech, 2) songs will be slower than speech, and 3) songs will have more stable pitch than speech. For similarities, we predict that 4) pitch interval size, 5) timbral brightness, and 6) pitch declination will be similar for song and speech. Because our opportunistic language sample (approximately half are Indo-European languages) and unusual design involving coauthors as participants (approximately 1/5 of coauthors had some awareness of our hypotheses when we recorded our singing/speaking) could affect our results, we will include robustness analyses to ensure our conclusions are robust to these biases, should they exist. Other features (e.g., rhythmic isochronicity, loudness) and comparisons involving instrumental melodies and recited lyrics will be investigated through post-hoc exploratory analyses. Our sample size of n=80 people providing sung/spoken recordings already exceeds the required number of recordings (i.e. 60) to achieve 95% power with the alpha level of 0.05 for the hypothesis testing of the selected six features. Our study will provide diverse cross-linguistic empirical evidence regarding the existence of cross-cultural regularities in song and speech, shed light on factors shaping humanity’s two universal vocal communication forms, and provide rich cross-cultural data to generate new hypotheses and inform future analyses of other factors (e.g., functional context, sex, age, musical/linguistic experience) that may shape global musical and linguistic diversity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.