The neural correlates of sentence production are typically studied using task paradigms that differ considerably from the experience of speaking outside of an experimental setting. In this fMRI study, we aimed to gain a better understanding of syntactic processing in spontaneous production versus naturalistic comprehension in three regions of interest (BA44, BA45, and left posterior middle temporal gyrus). A group of participants (n = 16) was asked to speak about the events of an episode of a TV series in the scanner. Another group of participants (n = 36) listened to the spoken recall of a participant from the first group. To model syntactic processing, we extracted word-by-word metrics of phrase-structure building with a top–down and a bottom–up parser that make different hypotheses about the timing of structure building. While the top–down parser anticipates syntactic structure, sometimes before it is obvious to the listener, the bottom–up parser builds syntactic structure in an integratory way after all of the evidence has been presented. In comprehension, neural activity was found to be better modeled by the bottom–up parser, while in production, it was better modeled by the top–down parser. We additionally modeled structure building in production with two strategies that were developed here to make different predictions about the incrementality of structure building during speaking. We found evidence for highly incremental and anticipatory structure building in production, which was confirmed by a converging analysis of the pausing patterns in speech. Overall, this study shows the feasibility of studying the neural dynamics of spontaneous language production.