Ethological views of brain functioning suggest that sound representations and computations in the auditory neural system are optimized finely to process and discriminate behaviorally relevant acoustic features and sounds (e.g., spectrotemporal modulations in the songs of zebra finches). Here, we show that modeling of neural sound representations in terms of frequency-specific spectrotemporal modulations enables accurate and specific reconstruction of real-life sounds from high-resolution functional magnetic resonance imaging (fMRI) response patterns in the human auditory cortex. Region-based analyses indicated that response patterns in separate portions of the auditory cortex are informative of distinctive sets of spectrotemporal modulations. Most relevantly, results revealed that in early auditory regions, and progressively more in surrounding regions, temporal modulations in a range relevant for speech analysis (∼2-4 Hz) were reconstructed more faithfully than other temporal modulations. In early auditory regions, this effect was frequency-dependent and only present for lower frequencies (<∼2 kHz), whereas for higher frequencies, reconstruction accuracy was higher for faster temporal modulations. Further analyses suggested that auditory cortical processing optimized for the fine-grained discrimination of speech and vocal sounds underlies this enhanced reconstruction accuracy. In sum, the present study introduces an approach to embed models of neural sound representations in the analysis of fMRI response patterns. Furthermore, it reveals that, in the human brain, even general purpose and fundamental neural processing mechanisms are shaped by the physical features of real-world stimuli that are most relevant for behavior (i.e., speech, voice).any natural and man-made sources in our environment produce acoustic waveforms (sounds) consisting of complex mixtures of multiple frequencies (Fig. 1A). At the cochlea, these waveforms are decomposed into frequency-specific temporal patterns of neural signals, typically described with auditory spectrograms (Fig. 1B). How complex sounds are further transformed and analyzed along the auditory neural pathway and in the cortex remains uncertain. Ethological considerations have led to the hypothesis that brain processing of sounds is optimized for spectrotemporal modulations, which are characteristically present in ecologically relevant sounds (1), such as in animal vocalizations [e.g., zebra finch (2), macaque monkeys (3)] and speech (2, 4-7). Modulations are regular variations of energy in time, in frequency, or in time and frequency simultaneously. Typically, in natural sounds, modulations dynamically change over time. The contribution of modulations to sound spectrograms can be made explicit using 4D (time-varying; Movies S1-S3, Left) or 3D (time-averaged; Fig. 1C) representations. Such representations highlight, for example, that the energy of human vocal sounds is mostly concentrated at lower frequencies and at lower temporal modulation rates (Fig. 1C, Left), whereas the...