Extracting regularities from ongoing stimulus streams to form predictions is crucial for adaptive behavior. Such regularities exist in terms of the content of the stimuli (i.e., what it is) and their timing (i.e., when it will occur), both of which are known to interactively modulate sensory processing. In real-world stimulus streams, regularities also occur contextually - e.g. predictions of individual notes vs. melodic contour in music. However, it is unknown whether the brain integrates predictions in a contextually congruent manner (e.g., if slower when predictions selectively interact with complex what predictions), and whether integrating predictions of simple vs. complex features rely on dissociable neural correlates. To address these questions, our study employed what and when violations at different levels - single tones (elements) vs. tone pairs (chunks) - within the same stimulus stream, while neural activity was recorded using electroencephalogram (EEG) in participants (N=20) performing a repetition detection task. Our results reveal that what and when predictions interactively modulated stimulus-evoked response amplitude in a contextually congruent manner, but that these modulations were shared between contexts in terms of the spatiotemporal distribution of EEG signals. Effective connectivity analysis using dynamic causal modeling showed that the integration of what and when prediction selectively increased connectivity at relatively late cortical processing stages, between the superior temporal gyrus and the fronto-parietal network. Taken together, these results suggest that the brain integrates different predictions with a high degree of contextual specificity, but in a shared and distributed cortical network.