This study investigates the effect of spectral degradation on cortical speech encoding in complex auditory scenes. Young normal-hearing listeners were simultaneously presented with two speech streams and were instructed to attend to only one of them. The speech mixtures were subjected to noise-channel vocoding to preserve the temporal envelope and degrade the spectral information of speech. Each subject was tested with five spectral resolution conditions (unprocessed speech, 64-, 32-, 16-, and 8-channel vocoder conditions) and two target-to-masker ratio (TMR) conditions (3 and 0 dB). Ongoing electroencephalographic (EEG) responses and speech comprehension were measured in each spectral and TMR condition for each subject. Neural tracking of each speech stream was characterized by cross-correlating the EEG responses with the envelope of each of the simultaneous speech streams at different time lags. Results showed that spectral degradation and TMR both significantly influenced how top-down attention modulated the EEG responses to the attended and unattended speech. That is, the EEG responses to the attended and unattended speech streams differed more for the higher (unprocessed, 64 ch, and 32 ch) than the lower (16 and 8 ch) spectral resolution conditions, as well as for the higher (3 dB) than the lower TMR (0 dB) condition. The magnitude of differential neural modulation responses to the attended and unattended speech streams significantly correlated with speech comprehension scores. These results suggest that severe spectral degradation and low TMR hinder speech stream segregation, making it difficult to employ top-down attention to differentially process different speech streams.