Listening selectively to one out of several competing speakers in a "cocktail party" situation is a highly demanding task. It relies on a widespread cortical network, including auditory sensory, but also frontal and parietal brain regions involved in controlling auditory attention. Previous work has shown that, during selective listening, ongoing neural activity in auditory sensory areas is dominated by the attended speech stream, whereas competing input is suppressed. The relationship between these attentional modulations in the sensory tracking of the attended speech stream and frontoparietal activity during selective listening is, however, not understood. We studied this question in young, healthy human participants (both sexes) using concurrent EEG-fMRI and a sustained selective listening task, in which one out of two competing speech streams had to be attended selectively. An EEG-based speech envelope reconstruction method was applied to assess the strength of the cortical tracking of the to-be-attended and the to-be-ignored stream during selective listening. Our results show that individual speech envelope reconstruction accuracies obtained for the to-be-attended speech stream were positively correlated with the amplitude of sustained BOLD responses in the right temporoparietal junction, a core region of the ventral attention network. This brain region further showed task-related functional connectivity to secondary auditory cortex and regions of the frontoparietal attention network, including the intraparietal sulcus and the inferior frontal gyrus. This suggests that the right temporoparietal junction is involved in controlling attention during selective listening, allowing for a better cortical tracking of the attended speech stream. Listening selectively to one out of several simultaneously talking speakers in a "cocktail party" situation is a highly demanding task. It activates a widespread network of auditory sensory and hierarchically higher frontoparietal brain regions. However, how these different processing levels interact during selective listening is not understood. Here, we investigated this question using fMRI and concurrently acquired scalp EEG. We found that activation levels in the right temporoparietal junction correlate with the sensory representation of a selectively attended speech stream. In addition, this region showed significant functional connectivity to both auditory sensory and other frontoparietal brain areas during selective listening. This suggests that the right temporoparietal junction contributes to controlling selective auditory attention in "cocktail party" situations.