Humans regularly produce new utterances that are understood by other members of the same language community 1 . Linguistic theories account for this ability through the use of syntactic rules (or generative grammars) that describe the acceptable structure of utterances 2 . The recursive, hierarchical embedding of language units (for example, words or phrases within shorter sentences) that is part of the ability to construct new utterances minimally requires a 'context-free' grammar 2, 3 that is more complex than the 'finite-state' grammars thought sufficient to specify the structure of all non-human communication signals. Recent hypotheses make the central claim that the capacity for syntactic recursion forms the computational core of a uniquely human language faculty 4,5 . Here we show that European starlings (Sturnus vulgaris) accurately recognize acoustic patterns defined by a recursive, self-embedding, context-free grammar. They are also able to classify new patterns defined by the grammar and reliably exclude agrammatical patterns. Thus, the capacity to classify sequences from recursive, centre-embedded grammars is not uniquely human. This finding opens a new range of complex syntactic processing mechanisms to physiological investigation.The computational complexity of generative grammars is formally defined 3 such that certain classes of temporally patterned strings can only be produced (or recognized) by specific classes of grammars (Fig. 1). Starlings sing long songs composed of iterated motifs (smaller acoustic units) 6 that form the basic perceptual units of individual song recognition 7-9 . Here we used eight 'rattle' and eight 'warble' motifs (see Methods) to create complete 'languages' (4,096 sequences) for two distinct grammars: a context-free grammar (CFG) of the form A 2 B 2 that entails recursive centre-embedding, and a finite-state grammar (FSG) of the form (AB) 2 that does not ( Fig. 2a, b; 'A' refers to rattles and 'B' to warbles).We trained 11 European starlings, using a go/nogo operant conditioning procedure, to classify subsets of sequences from these languages (see Methods and Supplementary Information). Nine out of eleven starlings learned to classify the FSG and CFG sequences accurately (as assessed by d', which provides an unbiased measure of sensitivity to differentiating between two classes of patterns), but this task was difficult (Fig. 2c). The rate of acquisition varied widely among the starlings that learned the task (303.44 ± 57.11 blocks to reach criterion (mean ± s.e.m.), range 94-562 blocks with 100 trials per block), and was slow by comparison to other operant song-recognition tasks 7 .To assess the possibility that starlings learned to classify correctly the motif patterns described by the CFG and FSG grammars through rote memorization of the training exemplars, we further (Fig. 3a). The mean d' over the first 100 trials with new stimuli (roughly six responses to each exemplar) was 1.08 ± 0.50, which is significantly better than chance performance (d' = 0). Over th...
The neural representations associated with learned auditory behaviours, such as recognizing individuals based on their vocalizations, are not well described. Higher vertebrates learn to recognize complex conspecific vocalizations that comprise sequences of easily identified, naturally occurring auditory objects, which should facilitate the analysis of higher auditory pathways. Here we describe the first example of neurons selective for learned conspecific vocalizations in adult animals--in starlings that have been trained operantly to recognize conspecific songs. The neuronal population is found in a non-primary forebrain auditory region, exhibits increased responses to the set of learned songs compared with novel songs, and shows differential responses to categories of learned songs based on recognition training contingencies. Within the population, many cells respond highly selectively to a subset of specific motifs (acoustic objects) present only in the learned songs. Such neuronal selectivity may contribute to song-recognition behaviour, which in starlings is sensitive to motif identity. In this system, both top-down and bottom-up processes may modify the tuning properties of neurons during recognition learning, giving rise to plastic representations of behaviourally meaningful auditory objects.
Animals produce vocalizations that range in complexity from a single repeated call to hundreds of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex vocalizations can require considerable effort and a deep intuition about each species’ vocal behavior. Even with a great deal of experience, human characterizations of animal communication can be affected by human perceptual biases. We present a set of computational methods for projecting animal vocalizations into low dimensional latent representational spaces that are directly learned from the spectrograms of vocal signals. We apply these methods to diverse datasets from over 20 species, including humans, bats, songbirds, mice, cetaceans, and nonhuman primates. Latent projections uncover complex features of data in visually intuitive and quantifiable ways, enabling high-powered comparative analyses of vocal acoustics. We introduce methods for analyzing vocalizations as both discrete sequences and as continuous latent variables. Each method can be used to disentangle complex spectro-temporal structure and observe long-timescale organization in communication.
Human speech possesses a rich hierarchical structure that allows for meaning to be altered by words spaced far apart in time. Conversely, the sequential structure of nonhuman communication is thought to follow non-hierarchical Markovian dynamics operating over only short distances. Here, we show that human speech and birdsong share a similar sequential structure indicative of both hierarchical and Markovian organization. We analyze the sequential dynamics of song from multiple songbird species and speech from multiple languages by modeling the information content of signals as a function of the sequential distance between vocal elements. Across short sequence-distances, an exponential decay dominates the information in speech and birdsong, consistent with underlying Markovian processes. At longer sequence-distances, the decay in information follows a power law, consistent with underlying hierarchical processes. Thus, the sequential organization of acoustic elements in two learned vocal communication signals (speech and birdsong) shows functionally equivalent dynamics, governed by similar processes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.