Music can be interpreted by attributing syntactic relationships to sequential musical events, and, computationally, such musical interpretation represents an analogous combinatorial task to syntactic processing in language. While this perspective has been primarily addressed in the domain of harmony, we focus here on rhythm in the Western tonal idiom, and we propose for the first time a framework for modeling the moment‐by‐moment execution of processing operations involved in the interpretation of music. Our approach is based on (1) a music‐theoretically motivated grammar formalizing the competence of rhythmic interpretation in terms of three basic types of dependency (preparation, syncopation, and split; Rohrmeier, 2020), and (2) psychologically plausible predictions about the complexity of structural integration and memory storage operations, necessary for parsing hierarchical dependencies, derived from the dependency locality theory (Gibson, 2000). With a behavioral experiment, we exemplify an empirical implementation of the proposed theoretical framework. One hundred listeners were asked to reproduce the location of a visual flash presented while listening to three rhythmic excerpts, each exemplifying a different interpretation under the formal grammar. The hypothesized execution of syntactic‐processing operations was found to be a significant predictor of the observed displacement between the reported and the objective location of the flashes. Overall, this study presents a theoretical approach and a first empirical proof‐of‐concept for modeling the cognitive process resulting in such interpretation as a form of syntactic parsing with algorithmic similarities to its linguistic counterpart. Results from the present small‐scale experiment should not be read as a final test of the theory, but they are consistent with the theoretical predictions after controlling for several possible confounding factors and may form the basis for further large‐scale and ecological testing.