Although challenging, adults can learn non-native phonetic contrasts with extensive training [1, 2], indicative of perceptual learning beyond an early sensitivity period [3, 4]. Training can alter low-level sensory encoding of newly acquired speech sound patterns [5]; however, the time-course, behavioral relevance, and long-term retention of such sensory plasticity is unclear. Some theories argue that sensory plasticity underlying signal enhancement is immediate and critical to perceptual learning [6, 7]. Others, like the reverse hierarchy theory (RHT), posit a slower time-course for sensory plasticity [8]. RHT proposes that higher-level categorical representations guide immediate, novice learning, while lower-level sensory changes do not emerge until expert stages of learning [9]. We trained 20 English-speaking adults to categorize a non-native phonetic contrast (Mandarin lexical tones) using a criterion-dependent sound-to-category training paradigm. Sensory and perceptual indices were assayed across operationally defined learning phases (novice, experienced, over-trained, and 8-week retention) by measuring the frequency-following response, a neurophonic potential that reflects fidelity of sensory encoding, and the perceptual identification of a tone continuum. Our results demonstrate that while robust changes in sensory encoding and perceptual identification of Mandarin tones emerged with training and were retained, such changes followed different timescales. Sensory changes were evidenced and related to behavioral performance only when participants were over-trained. In contrast, changes in perceptual identification reflecting improvement in categorical percept emerged relatively earlier. Individual differences in perceptual identification, and not sensory encoding, related to faster learning. Our findings support the RHT-sensory plasticity accompanies, rather than drives, expert levels of non-native speech learning.