Sensory processing by neural circuits includes numerous nonlinear transformations that are critical to perception. Our understanding of these nonlinear mechanisms, however, is hindered by the lack of a comprehensive and interpretable computational framework that can model and explain nonlinear signal transformations. Here, we propose a data-driven framework based on deep neural network regression models that can directly learn any nonlinear stimulus-response mapping. A key component of this approach is an analysis method that reformulates the exact function of the trained neural network as a collection of stimulus-dependent linear functions.This locally linear receptive field interpretation of the network function enables straightforward comparison with conventional receptive field models and uncovers nonlinear encoding properties. We demonstrate the efficacy of this framework by predicting the neural responses recorded invasively from the auditory cortex of neurosurgical patients as they listened to speech.Our method significantly improves the prediction accuracy of auditory cortical responses particularly in nonprimary areas. Moreover, interpreting the functions learned by neural networks uncovered three distinct types of nonlinear transformations of speech that varied considerably in primary and nonprimary auditory regions. By combining two desired properties of a computational sensory-response model; the ability to capture arbitrary stimulus-response mappings and maintaining model interpretability, this data-driven method can lead to better neurophysiological models of the sensory processing.