Hierarchical structure and compositionality imbue human language with unparalleled expressive power and set it apart from other perception–action systems. However, neither formal nor neurobiological models account for how these defining computational properties might arise in a physiological system. I attempt to reconcile hierarchy and compositionality with principles from cell assembly computation in neuroscience; the result is an emerging theory of how the brain could convert distributed perceptual representations into hierarchical structures across multiple timescales while representing interpretable incremental stages of (de)compositional meaning. The model's architecture—a multidimensional coordinate system based on neurophysiological models of sensory processing—proposes that a manifold of neural trajectories encodes sensory, motor, and abstract linguistic states. Gain modulation, including inhibition, tunes the path in the manifold in accordance with behavior and is how latent structure is inferred. As a consequence, predictive information about upcoming sensory input during production and comprehension is available without a separate operation. The proposed processing mechanism is synthesized from current models of neural entrainment to speech, concepts from systems neuroscience and category theory, and a symbolic-connectionist computational model that uses time and rhythm to structure information. I build on evidence from cognitive neuroscience and computational modeling that suggests a formal and mechanistic alignment between structure building and neural oscillations, and moves toward unifying basic insights from linguistics and psycholinguistics with the currency of neural computation.