The ability to attend to specific moments in time is crucial for survival across species facilitating perception and motor performance by leveraging prior temporal knowledge for predictive processing. Despite its importance, the neural mechanisms underlying the utilization of macro-scale and meso-scale neural resources during temporal processing and their relationship to behavioural strategies and motor responses remain largely unexplored. To investigate the capacity for predictive temporal structure of multisensory stimuli to optimize motor behaviour, we established a behavioural paradigm, in which mice were trained to an auditory-cue and visual-target presented at expected or unexpected temporal delays. Using a combination of stimulus-evoked and resting-state functional magnetic resonance imaging, we examined task-related evoked activity in brain-wide networks and found that that the formation of temporal expectations relying on accumulated sensory information and combined multisensory input involves plasticity across large macro-scale cortical networks comprised of primary sensory systems, sensory association areas including posterior parietal cortex, retrosplenial cortex, prefrontal top-down executive control centres of the brain, as well as hippocampal networks. Additionally, employing in vivo two-photon calcium imaging, we explored local single-cell dynamics within the posterior parietal cortex during this task and found that temporal expectation could be decoded directly from neuronal activity within this brain region. Overall, our study provides insights into the neural correlates underlying the formation of multisensory temporal expectations in the mouse brain and highlights the recruitment of neural resources across temporally-driven statistical learning processes.