Auditory attention describes a listeners focus on an acoustic source while they ignore other competing sources that might be present. In an environment with multiple talkers and background noise (i.e. the cocktail party effect), auditory attention can be difficult, requiring the listener to expend measurable cognitive effort. A listener will naturally interrupt sustained attention on a source when switching towards another source during conversation. This change in attention is potentially even more taxing than maintaining sustained attention due to the limits of human working memory, and this additional effort required has not been well studied. In this work, we evaluated an attention decoder algorithm for detecting the change in attention and investigated cognitive effort expended during attentional switching and sustained attention. Two variants of endogenous attention switching were explored: the switches either had in-the-moment decision making or a pre-defined attentional switch time. A least-squares, EEG-based, attention decoding algorithm achieved 64.1% accuracy with a 5-second correlation window and illustrated smooth transitions in the attended talker prediction through switches in sustained attention at approximately half of the analysis window size (2.2 seconds). The expended listening effort, as measured by simultaneous electroencephalography (EEG) and pupillometry, was also a strong indicator of switching. Specifically, centrotemporal alpha power [F(2, 18) = 7.473, P = 0.00434] and mean pupil diameter [F(2, 18) = 9.159, P = 0.0018] were significantly different for trials that contained a switch in comparison to sustained trials. We also found that relative attended and ignored talker locations modulate the EEG alpha topographic response. This alpha lateralization was found to be impacted by the interaction between experimental condition and whether the measure was computed before or after the switch [F(2,18) = 3.227, P = 0.0634]. These results suggest that expended listening effort is a promising feature that should be pursued in a decoding context, in addition to speech and location-based features.