Interaction with the world is a multisensory experience, but most of what
is known about the neural correlates of perception comes from studying vision.
Auditory inputs enter cortex with its own set of unique qualities, and leads to
use in oral communication, speech, music, and the understanding of emotional and
intentional states of others, all of which are central to the human experience.
To better understand how the auditory system develops, recovers after injury,
and how it may have transitioned in its functions over the course of hominin
evolution, advances are needed in models of how the human brain is organized to
process real-world natural sounds and “auditory objects”. This
review presents a simple fundamental neurobiological model of hearing perception
at a category level that incorporates principles of bottom-up signal processing
together with top-down constraints of grounded cognition theories of knowledge
representation. Though mostly derived from human neuroimaging literature, this
theoretical framework highlights rudimentary principles of real-world sound
processing that may apply to most if not all mammalian species with hearing and
acoustic communication abilities. The model encompasses three basic categories
of sound-source: (1) action sounds (non-vocalizations) produced by
‘living things’, with human (conspecific) and non-human animal
sources representing two subcategories; (2) action sounds produced by
‘non-living things’, including environmental sources and
human-made machinery; and (3) vocalizations (‘living things’),
with human versus non-human animals as two subcategories therein. The model is
presented in the context of cognitive architectures relating to multisensory,
sensory-motor, and spoken language organizations. The models’ predictive
values are further discussed in the context of anthropological theories of oral
communication evolution and the neurodevelopment of spoken language
proto-networks in infants/toddlers. These phylogenetic and ontogenetic
frameworks both entail cortical network maturations that are proposed to at
least in part be organized around a number of universal acoustic-semantic signal
attributes of natural sounds, which are addressed herein.