Animals produce vocalizations that range in complexity from a single repeated call to hundreds of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex vocalizations can require considerable effort and a deep intuition about each species’ vocal behavior. Even with a great deal of experience, human characterizations of animal communication can be affected by human perceptual biases. We present a set of computational methods for projecting animal vocalizations into low dimensional latent representational spaces that are directly learned from the spectrograms of vocal signals. We apply these methods to diverse datasets from over 20 species, including humans, bats, songbirds, mice, cetaceans, and nonhuman primates. Latent projections uncover complex features of data in visually intuitive and quantifiable ways, enabling high-powered comparative analyses of vocal acoustics. We introduce methods for analyzing vocalizations as both discrete sequences and as continuous latent variables. Each method can be used to disentangle complex spectro-temporal structure and observe long-timescale organization in communication.
Human speech possesses a rich hierarchical structure that allows for meaning to be altered by words spaced far apart in time. Conversely, the sequential structure of nonhuman communication is thought to follow non-hierarchical Markovian dynamics operating over only short distances. Here, we show that human speech and birdsong share a similar sequential structure indicative of both hierarchical and Markovian organization. We analyze the sequential dynamics of song from multiple songbird species and speech from multiple languages by modeling the information content of signals as a function of the sequential distance between vocal elements. Across short sequence-distances, an exponential decay dominates the information in speech and birdsong, consistent with underlying Markovian processes. At longer sequence-distances, the decay in information follows a power law, consistent with underlying hierarchical processes. Thus, the sequential organization of acoustic elements in two learned vocal communication signals (speech and birdsong) shows functionally equivalent dynamics, governed by similar processes.
UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.
Animals produce vocalizations that range in complexity from a single repeated call to hundreds 1 of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex 2 vocalizations can require considerable effort and a deep intuition about each species' vocal behavior. 3 Even with a great deal of experience, human characterizations of animal communication can be 4 affected by human perceptual biases. We present here a set of computational methods that center 5 around projecting animal vocalizations into low dimensional latent representational spaces that 6 are directly learned from data. We apply these methods to diverse datasets from over 20 species, 7 including humans, bats, songbirds, mice, cetaceans, and nonhuman primates, enabling high-powered 8 comparative analyses of unbiased acoustic features in the communicative repertoires across species. 9 Latent projections uncover complex features of data in visually intuitive and quantifiable ways. We 10 introduce methods for analyzing vocalizations as both discrete sequences and as continuous latent 11 variables. Each method can be used to disentangle complex spectro-temporal structure and observe 12 long-timescale organization in communication. Finally, we show how systematic sampling from 13 latent representational spaces of vocalizations enables comprehensive investigations of perceptual 14 and neural representations of complex and ecologically relevant acoustic feature spaces. 15Of the thousands of species that communicate vocally, the repertoires of only a tiny minority have been characterized or 18 studied in detail. This is due, in large part, to traditional analysis methods that require a high level of expertise that is 19 hard to develop and often species-specific. Here, we present a set of novel methods to project animal vocalizations 20 into latent feature spaces to quantitatively compare and develop visual intuitions about animal vocalizations, and to 21 systematically synthesize novel species-typical vocalizations from learned feature sets. We demonstrate these methods 22 across a series of analyses over 19 datasets of animal vocalizations from 29 different species, including songbirds, mice, 23 monkeys, humans, and whales. We show how learned latent feature spaces untangle complex spectro-temporal structure, 24 enable unbiased comparisons, and uncover high-level features such as individual identity and population dialects. We 25 generate smoothly varying morphs between vocalizations from a songbird species with a spectro-temporally complex 26 vocal repertoire, European starlings, and show how these methods enable a new degree of control over ecologically 27 relevant signals that can be broadly applied across behavioral and physiological experimental settings. 28 2 Introduction 29Vocal communication is a social behavior common to much of the animal kingdom in which acoustic signals are 30 transmitted from sender to receiver to convey various forms of information such as identity, individual fitness, or the 31 presence ...
Background: The manual detection, analysis and classification of animal vocalizations in acoustic recordings is laborious and requires expert knowledge. Hence, there is a need for objective, generalizable methods that detect underlying patterns in these data, categorize sounds into distinct groups and quantify similarities between them. Among all computational methods that have been proposed to accomplish this, neighbourhood‐based dimensionality reduction of spectrograms to produce a latent space representation of calls stands out for its conceptual simplicity and effectiveness. Goal of the study/what was done: Using a dataset of manually annotated meerkat Suricata suricatta vocalizations, we demonstrate how this method can be used to obtain meaningful latent space representations that reflect the established taxonomy of call types. We analyse strengths and weaknesses of the proposed approach, give recommendations for its usage and show application examples, such as the classification of ambiguous calls and the detection of mislabelled calls. What this means: All analyses are accompanied by example code to help researchers realize the potential of this method for the study of animal vocalizations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.