One’s own voice undergoes unique processing that distinguishes it from others’ voices, and thus listening to it may have a special neural basis for self-talk as an emotion regulation strategy. This study aimed to elucidate how neural effects of one’s own voice differ from those of others’ voices on the implementation of emotion regulation strategies. Twenty-one healthy adults were scanned using fMRI while listening to sentences synthesized in their own or others’ voices for self-affirmation and cognitive defusion, which were based on mental commitments to strengthen one’s positive aspects and imagining metaphoric actions to shake off negative aspects, respectively. The interaction effect between voice identity and strategy was observed in the superior temporal sulcus, middle temporal gyrus, and parahippocampal cortex, and activity in these regions showed that the uniqueness of one’s own voice is reflected more strongly for cognitive defusion than for self-affirmation. This interaction was also seen in the precuneus, suggesting intertwining of self-referential processing and episodic memory retrieval in self-affirmation with one’s own voice. These results imply that unique effects of one’s own voice may be expressed differently due to the degree of engagement of neural sharpening-related regions and self-referential networks depending on the type of emotion regulation.