Inner speech is a silent verbal experience and plays central roles in human consciousness and cognition. Despite impressive studies over the past decades, the neural mechanisms of inner speech remain largely unknown. In this study, we adopted an ecological paradigm called situationally simulated inner speech. Unlike mere imaging speech of words, situationally simulated inner speech involves the dynamic integration of contextual background, episodic and semantic memories, and external events into a coherent structure. We conducted dynamic activation and network analyses on fMRI data, where participants were instructed to engage in inner speech prompted by cue words across 10 different contextual backgrounds. Our seed-based co-activation pattern analyses revealed dynamic involvement of the language network, sensorimotor network, and default mode network in situationally simulated inner speech. Additionally, frame-wise dynamic conditional correlation analysis uncovered four temporal-reoccurring states with distinct functional connectivity patterns among these networks. We proposed a triple network model for deliberate inner speech, including language network for a truncated form of overt speech, sensorimotor network for perceptual simulation and monitoring, and default model network for integration and ‘sense-making’ processing.HighlightsIn ten contextual backgrounds, subjects were instructed to perform situationally simulated inner speech based on cue words.The ventral parts of the bilateral somatosensory areas and middle superior temporal gyrus were as centers for seed-based co-activation pattern analyses.A triple network model of language network, sensorimotor network, and default mode network was proposed for deliberate inner speech.