Understanding how the brain represents sensory information and triggers behavioural responses is a fundamental goal in neuroscience. Recent advances in neuronal recording techniques aim to progress towards this milestone, yet the resulting high dimensional responses are challenging to interpret and link to relevant variables. In this work, we introduce SPARKS, a model capable of generating low dimensional latent representations of high dimensional neural recordings. SPARKS adapts the self-attention mechanism of large language models to extract information from the timing of single spikes and the sequence in which neurons fire using Hebbian learning. Trained with a criterion inspired by predictive coding to enforce temporal coherence, our model produces interpretable embeddings that are robust across animals and sessions. Behavioural recordings can be used to inform the latent representations of the neural data, and we demonstrate state-of-the-art predictive capabilities across diverse electrophysiology and calcium imaging datasets from the motor, visual and entorhinal cortices. We also show how SPARKS can be applied to large neuronal networks by revealing the temporal evolution of visual information encoding across the hierarchy of the visual cortex. Overall, by integrating biological mechanisms into a machine learning model, we provide a powerful tool to study large-scale network dynamics. SPARKS' capacity to generalize across animals and behavioural states suggests it is capable of estimating the internal latent generative model of the world in animals, paving the way towards a foundation model for neuroscience.