Emotionally valenced words have thus far not been empirically examined in a bilingual population with the emotional face–word Stroop paradigm. Chinese-English bilinguals were asked to identify the facial expressions of emotion with their first (L1) or second (L2) language task-irrelevant emotion words superimposed on the face pictures. We attempted to examine how the emotional content of words modulated behavioral performance and cerebral functioning in the bilinguals’ two languages. The results indicated that there were significant congruency effects for both L1 and L2 emotion words, and that identifiable differences in the magnitude of the Stroop effect between the two languages were also observed, suggesting L1 is more capable of activating the emotional response to word stimuli. For event-related potentials data, an N350–550 effect was observed only in the L1 task with greater negativity for incongruent than congruent trials. The size of the N350–550 effect differed across languages, whereas no identifiable language distinction was observed in the effect of conflict slow potential (conflict SP). Finally, more pronounced negative amplitude at 230–330 ms was observed in L1 than in L2, but only for incongruent trials. This negativity, likened to an orthographic decoding N250, may reflect the extent of attention to emotion word processing at word-form level, while the N350–550 reflects a complicated set of processes in the conflict processing. Overall, the face–word congruency effect has reflected identifiable language distinction at 230–330 and 350-550 ms, which provides supporting evidence for the theoretical proposals assuming attenuated emotionality of L2 processing.