Our ability to invariably identify words spoken with different accents and timbres requires 1 our perception to tolerate multiple acoustic modulations. However, the mechanisms by 2 which lexical information is perceived to be invariant is unknown. In this study, we 3 explored the ability of two trained rhesus monkeys to recognise many sounds, including 4 multisyllabic words presented in numerous variations. We found that the monkeys' lexical 5 representations are remarkably tolerant of dynamic acoustic changes. Furthermore, we 6 determined that the macaques invariably recognise sounds with formants at close Euclidean 7 distances from a learned category. Based on our results, we propose the possible existence 8 of neuronal circuits responsible for the invariant recognition of lexical representations in 9 macaques. 10 11 14 vital for communication in primates, the perceptual basis of invariant recognition of sounds 15has not extensively investigated. One possible reason for this is that non-human primates 16 may only demonstrate limited acoustic learning 4-6 , and that therefore their recognition 17 capability may depend on genetically-programmed circuits 7-9 . However, it is known that 18 macaques are capable of learning repertoires of visual categories 10 and even declare the 19 existence of objects with ambiguous or incomplete information 11 . Studies of the 20 inferotemporal and prefrontal cortices of monkeys showed neurons whose categorical 21 responses achieved the grouping of wide variations of images [12][13][14] , which is consistent with 22 perceptual reports 15,16 . Similarly, experiments in the prefrontal cortex and secondary 23 3 acoustic areas suggest invariant coding of acoustic categories [17][18][19][20][21][22][23][24][25] . In this paper, we sought 24 to determine what acoustic parameters stand for the invariant recognition of sounds (IRS) 25 in trained non-human primates 26 . We hypothesised that monkeys invariably recognise 26 sounds of frequency-saliences with dynamics similar to sounds the monkeys had 27 learned 27,28 . To test this, we designed a novel paradigm in which the macaques efficiently 28 reported target (T) sounds presented in sequences that included nontarget (N) sounds. We 29 found that the monkeys recognised sound fragments of frequency modulations in the range 30 of the prominences of the learned sounds as invariants. Our results allowed us to elucidate, 31 for the first time, the acoustic parameters 29-34 that lead to monkeys' IRS. We propose that 32 the ability to group complex stimuli into particular categories is served by chunks of 33 prominent features of the stimuli. 34 35 Results 36We trained two rhesus monkeys in an acoustic recognition task in order to study the 37 invariant recognition of sounds. During the task, the monkeys obtained a reward for 38 releasing a lever after identifying a T presented after zero, one or two Ns ( Fig. 1a-c; see 39 Methods). After two years of training, monkey V recognised seven Ts and twenty-one Ns, 40 and monkey X recognised eleven Ts a...
In human speech and communication across various species, recognizing sounds is fundamental for the selection of appropriate behaviors. But how does the brain decide which action to perform based on sounds? We explored whether the premotor supplementary motor area (SMA), responsible for linking sensory information to motor programs, also accounts for auditory-driven decision making. To this end, we trained two rhesus monkeys to discriminate between numerous naturalistic sounds and words learned as target (T) or non-target (nT) categories. We demonstrated that the neural population is organized differently during the auditory and the movement periods of the task, implying that it is performing different computations in each period. We found that SMA neurons perform acoustic-decision-related computations that transition from auditory to movement representations in this task. Our results suggest that the SMA integrates sensory information while listening to auditory stimuli in order to form categorical signals that drive behavior.
In social animals, identifying sounds is critical for communication. In humans, the acoustic parameters involved in speech recognition, such as the formant frequencies derived from the resonance of the supralaryngeal vocal tract, have been well documented. However, how formants contribute to recognizing learned sounds in non-human primates remains unclear. To determine this, we trained two rhesus monkeys to discriminate target and non-target sounds presented in sequences of 1–3 sounds. After training, we performed three experiments: (1) We tested the monkeys’ accuracy and reaction times during the discrimination of various acoustic categories; (2) their ability to discriminate morphing sounds; and (3) their ability to identify sounds consisting of formant 1 (F1), formant 2 (F2), or F1 and F2 (F1F2) pass filters. Our results indicate that macaques can learn diverse sounds and discriminate from morphs and formants F1 and F2, suggesting that information from few acoustic parameters suffice for recognizing complex sounds. We anticipate that future neurophysiological experiments in this paradigm may help elucidate how formants contribute to the recognition of sounds.
The supplementary motor area (SMA) of the brain is critical for integrating memory and sensory signals into perceptual decisions. For example, in macaques, SMA activity correlates with decisions based on the comparison of sounds.1 In humans, functional MRI shows SMA activation during the invariant recognition of words pronounced by different speakers.2 Nevertheless, the neuronal correlates of perceptual invariance are unknown. Here we show that the SMA of macaques associates novel sounds with behaviors triggered by similar learned categories when recognizing sounds such as words. Notably, the neuronal activity at single and population levels correlates with the monkeys’ behaviors (e.g. hits and false alarms). Our results demonstrate that invariant recognitions of complex sounds involve premotor computations in areas other than the temporal and parietal speech areas. Therefore, we propose that perceptual invariance depends on motor predictions and not only sensory representations. We anticipate that studies on speech will observe sensory-motor transformations of acoustic information into motor skills.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.