The ability to discriminate self- and non-self voice cues is a fundamental aspect of self-awareness and subserves self-monitoring during verbal communication. Nonetheless, the neurofunctional underpinnings of self-voice perception and recognition are still poorly understood. Moreover, how attention and stimulus complexity influence the processing and recognition of one's own voice remains to be clarified. Using an oddball task, the current study investigated how self-relevance and stimulus type interact during selective attention to voices, and how they affect the representation of regularity during voice perception. Event-related potentials (ERPs) were recorded from 18 right-handed males. Pre-recorded self-generated (SGV) and non-self (NSV) voices, consisting of a nonverbal vocalization (vocalization condition) or disyllabic word (word condition), were presented as either standard or target stimuli in different experimental blocks. The results showed increased N2 amplitude to SGV relative to NSV stimuli. Stimulus type modulated later processing stages only: P3 amplitude was increased for SGV relative to NSV words, whereas no differences between SGV and NSV were observed in the case of vocalizations. Moreover, SGV standards elicited reduced N1 and P2 amplitude relative to NSV standards. These findings revealed that the self-voice grabs more attention when listeners are exposed to words but not vocalizations. Further, they indicate that detection of regularity in an auditory stream is facilitated for one's own voice at early processing stages. Together, they demonstrate that self-relevance affects attention to voices differently as a function of stimulus type.