In this paper, we explore the possibility of using a simultaneous visual and haptic stimulus to increase the selective listening attention of humans. Using a real-time self-reported evaluation method, the listening attention of several participants was measured. Specifically, the participants listened to a classical fugue while receiving a resemblant non-auditory stimulus synchronized with the notes of a single instrument in the song. Thereafter, the participants (n = 30) were asked to press down on a button if they were able to focus their listening attention on the instrument highlighted by the non-audio stimulus, while their initial detection time and total focus time were measured. Three combinations of two different stimuli modalities were compared: visual, haptic, and the combination of visual and haptic, using three classical polyphonic fugues. The empirical experiment results indicate that, regardless of a participants music skills or the voice pitch, the participants performance was improved by the visual-haptic stimuli, with longer selective listening periods and faster detection times compared with the single modality stimuli. At the end of the experiment, a subjective questionnaire was applied to measure the subjective participants ease between the stimulus conditions. The questionnaire results indicated that the participants preferred to use the visual-haptic stimulus, compared to the visual or haptic stimuli alone.