Speech perception behavioral research suggests that rates of sensory memory decay are dependent on stimulus properties at more than one level (e.g., acoustic level, phonemic level). The neurophysiology of sensory memory decay rate has rarely been examined in the context of speech processing. In a lexical tone study, we showed that long-term memory representation of lexical tone slows the decay rate of sensory memory for these tones. Here, we tested the hypothesis that long-term memory representation of vowels slows the rate of auditory sensory memory decay in a similar way to that of lexical tone. Event-related potential (ERP) responses were recorded to Mandarin non-words contrasting the vowels /i/ vs. /u/ and /y/ vs. /u/ from first-language (L1) Mandarin and L1 American English participants under short and long interstimulus interval (ISI) conditions (short ISI: an average of 575 ms, long ISI: an average of 2675 ms). Results revealed poorer discrimination of the vowel contrasts for English listeners than Mandarin listeners, but with different patterns for behavioral perception and neural discrimination. As predicted, English listeners showed the poorest discrimination and identification for the vowel contrast /y/ vs. /u/, and poorer performance in the long ISI condition. In contrast to Yu et al. (2017), however, we found no effect of ISI reflected in the neural responses, specifically the mismatch negativity (MMN), P3a and late negativity ERP amplitudes. We did see a language group effect, with Mandarin listeners generally showing larger MMN and English listeners showing larger P3a. The behavioral results revealed that native language experience plays a role in echoic sensory memory trace maintenance, but the failure to find an effect of ISI on the ERP results suggests that vowel and lexical tone memory traces decay at different rates.Highlights:We examined the interaction between auditory sensory memory decay and language experience.We compared MMN, P3a, LN and behavioral responses in short vs. long interstimulus intervals.We found that different from lexical tone contrast, MMN, P3a, and LN changes to vowel contrasts are not influenced by lengthening the ISI to 2.6 s.We also found that the English listeners discriminated the non-native vowel contrast with lower accuracy under the long ISI condition.