Speech processing is highly modulated by context. Prior studies examining frequency-following responses (FFRs), an electrophysiological 'neurophonic' potential that faithfully reflects phase-locked activity from neural ensembles within the auditory network, have demonstrated that stimulus context modulates the integrity of speech encoding. The extent to which context-dependent encoding reflects general auditory properties or interactivities between statistical and higher-level linguistic processes remains unexplored. Our study examined whether speech encoding, as reflected by FFRs, is modulated by abstract phonological relationships between a stimulus and surrounding contexts. FFRs were elicited to a Mandarin rising-tone syllable (/ji-TR/, 'second') randomly presented with other syllables in three contexts from 17 native listeners. In a contrastive context, /ji-TR/ occurred with meaning-contrastive high-level-tone syllables (/ji-H/, 'one'). In an allotone context, TR occurred with dipping-tone syllables /ji-D/, a non-meaning-contrastive variant of /ji-TR/. In a repetitive context, the same /ji-TR/ occurred with other speech tokens of /ji-TR/. Consistent with prior work, neural tracking of /ji-TR/ pitch contour was more faithful in the repetitive condition wherein /ji-TR/ occurred more predictably (p = 1) than in the contrastive condition (p = 0.34). Crucially, in the allotone context, neural tracking of /ji-TR/ was more accurate relative to the contrastive context, despite both having an identical transitional probability (p = 0.34). Mechanistically, the non-meaning-contrastive relationship may have augmented the probability to /ji-TR/ occurrence in the allotone context. Results indicate online interactions between bottom-up and top-down mechanisms, which facilitate speech perception. Such interactivities may predictively fine-tune incoming speech encoding using linguistic and statistical information from prior context.