Given the growing number of available audio streams through a variety of sources and distribution channels, effective and advanced computational audio analysis has received increasing interest in the multimedia field. However, the effectiveness of current audio analysis strategies might be hampered due to the lack of effective representation of high-level semantics perceived by the human and the lack of effective approaches to bridging the gaps between most low-level acoustic features and high-level semantic features. This semantic gap has become the 'bottleneck' problem in audio analysis. In this paper, we propose a computational framework to decode biologically-plausible auditory saliency using high-level features derived from functional magnetic resonance imaging (fMRI) which monitors the human brain's response under the natural stimulus of audio listening. Specifically, we identify meaningful intrinsic brain networks which are involved in audio listening via effective online dictionary learning and sparse representation of wholebrain fMRI signals, reconstruct auditory saliency features using those identified brain network components, and perform groupwise analysis to identify consistent 'brain decoders' of the saliency features across different excerpts and participants. Experimental results demonstrate that the auditory saliency features are effectively decoded via our methods, which potentially provide opportunities for various applications in the multimedia field.