Natural speech perception requires processing the ongoing acoustic input while keeping in mind the preceding one and predicting the next. This complex computational problem could be handled by a dynamic multi-timescale hierarchical inferential process that coordinates the information flow up and down the language network hierarchy. Using a predictive coding computational model (Precoss-β) that identifies online individual syllables from continuous speech, we address the advantage of a rhythmic modulation of up and down information flows, and whether beta oscillations could be optimal for this. In the model, and consistent with experimental data, theta and low-gamma neural frequency scales ensure syllable-tracking and phoneme-level speech encoding, respectively, while the beta rhythm is associated with inferential processes. We show that a rhythmic alternation of bottom-up and top-down processing regimes improves syllable recognition, and that optimal efficacy is reached when the alternation of bottom-up and top-down regimes, via oscillating prediction error precisions, is in the beta range (around 20–30 Hz). These results not only demonstrate the advantage of a rhythmic alternation of up- and down-going information, but also that the low-beta range is optimal given sensory analysis at theta and low-gamma scales. While specific to speech processing, the notion of alternating bottom-up and top-down processes with frequency multiplexing might generalize to other cognitive architectures.