Human beings effortlessly perceive structural meaning from a biophysical signal such as speech or sign. An explanation of the processes underlying this phenomenon needs to account for both the properties of the signal, as well as those of the neural architecture deploying linguistic knowledge. This article approaches the question from a mathematical perspective, providing a neurophysiologically grounded explanation of the process underlying linguistic structure building in the brain. For defining the properties of the signal, we rely on the mathematical linguistics of DisCoCat (Coecke et al. 2010), a syntactically sensitive formalism of distributional meaning that makes no claims about the neural processing underlying sentence comprehension. The neuroscientific architecture derives from a cue-integration model of language comprehension (Martin 2020). In it, the brain infers the latent structure of a cue or signal based on knowledge of the language, through a process of neural coordinate transform. In this work, we demonstrate how the DisCoCat formalism can interface with the neurophysiological process model and describe how the resulting incremental process model can return a formal description at each timestep. Second, we present an extension to show how structure building from phonological segments to syllables can be modelled within a categorial grammar setup, integrating it with our process model. Third, we introduce a temporal metric interpretation on the transformations occuring within the extended DisCoCat formalism for each level of representation - also known as categorical enrichment. As a result of this specification, we obtain a mechanistic account of neural oscillatory readouts during language comprehension.