Routing Brain Traffic Through the Von Neumann Bottleneck: Parallel Sorting and Refactoring

The need for reproducible, credible, multiscale biological modeling has led to the development of standardized simulation platforms, such as the widely-used NEURON environment for computational neuroscience. Developing and maintaining NEURON over several decades has required attention to the competing needs of backwards compatibility, evolving computer architectures, the addition of new scales and physical processes, accessibility to new users, and efficiency and flexibility for specialists. In order to meet these challenges, we have now substantially modernized NEURON, providing continuous integration, an improved build system and release workflow, and better documentation. With the help of a new source-to-source compiler of the NMODL domain-specific language we have enhanced NEURON's ability to run efficiently, via the CoreNEURON simulation engine, on a variety of hardware platforms, including GPUs. Through the implementation of an optimized in-memory transfer mechanism this performance optimized backend is made easily accessible to users, providing training and model-development paths from laptop to workstation to supercomputer and cloud platform. Similarly, we have been able to accelerate NEURON's reaction-diffusion simulation performance through the use of just-in-time compilation. We show that these efforts have led to a growing developer base, a simpler and more robust software distribution, a wider range of supported computer architectures, a better integration of NEURON with other scientific workflows, and substantially improved performance for the simulation of biophysical and biochemical models.

Section: Introductionmentioning

confidence: 99%

Modernizing the NEURON Simulator for Sustainability, Portability, and Performance

Awile

Kumbhar

Cornu

et al. 2022

“…However, the reduced communication and update times here come at the cost of increased delivery times due to an additional indirection introduced with spike compression. Ongoing work targets the delivery phase (Pronold et al, 2021 , 2022 ) and gives a perspective for performance improvements in future releases.…”

Section: Resultsmentioning

confidence: 99%

“…We distinguish between simulators that run on conventional HPC systems and those that use dedicated neuromorphic hardware. Prominent examples of simulators for networks of spiking point-neurons are (Morrison et al, 2005b ; Gewaltig and Diesmann, 2007 ; Plesser et al, 2007 ; Helias et al, 2012 ; Kunkel et al, 2012 , 2014 ; Ippen et al, 2017 ; Kunkel and Schenck, 2017 ; Jordan et al, 2018 ; Pronold et al, 2021 , 2022 ) and (Goodman and Brette, 2008 ; Stimberg et al, 2019 ) using CPUs; (Yavuz et al, 2016 ; Knight and Nowotny, 2018 , 2021 ; Stimberg et al, 2020 ; Knight et al, 2021 ) and (Golosio et al, 2021 ) using GPUs; (Nageswaran et al, 2009 ; Richert et al, 2011 ; Beyeler et al, 2015 ; Chou et al, 2018 ) running on heterogeneous clusters; and the neuromorphic hardware (Furber et al, 2014 ; Rhodes et al, 2019 ). (Carnevale and Hines, 2006 ; Migliore et al, 2006 ; Lytton et al, 2016 ) and (Akar et al, 2019 ) aim for simulating morphologically detailed neuronal networks.…”

Section: Introductionmentioning

confidence: 99%

A Modular Workflow for Performance Benchmarking of Neuronal Network Simulations

Albers

Pronold

Kurth

et al. 2022

Self Cite

Modern computational neuroscience strives to develop complex network models to explain dynamics and function of brains in health and disease. This process goes hand in hand with advancements in the theory of neuronal networks and increasing availability of detailed anatomical data on brain connectivity. Large-scale models that study interactions between multiple brain areas with intricate connectivity and investigate phenomena on long time scales such as system-level learning require progress in simulation speed. The corresponding development of state-of-the-art simulation engines relies on information provided by benchmark simulations which assess the time-to-solution for scientifically relevant, complementary network models using various combinations of hardware and software revisions. However, maintaining comparability of benchmark results is difficult due to a lack of standardized specifications for measuring the scaling performance of simulators on high-performance computing (HPC) systems. Motivated by the challenging complexity of benchmarking, we define a generic workflow that decomposes the endeavor into unique segments consisting of separate modules. As a reference implementation for the conceptual workflow, we develop beNNch: an open-source software framework for the configuration, execution, and analysis of benchmarks for neuronal network simulations. The framework records benchmarking data and metadata in a unified way to foster reproducibility. For illustration, we measure the performance of various versions of the NEST simulator across network models with different levels of complexity on a contemporary HPC system, demonstrating how performance bottlenecks can be identified, ultimately guiding the development toward more efficient simulation technology.

“…For example, in large-scale networks, synaptic processing substantially dominates the computational load, and the irregular, random access pattern in retrieving the presynaptic data reduce a processor's cache hit rate and increases data access latencies (see e.g., Kunkel et al, 2014 ). The tools of trade here are algorithms that implement high parallelism in computations, “cache-friendly” data structures, and the application of techniques for latency hiding, such as data prefetching (Pronold et al, 2022 ). The proposed HNC node design aims to address these problems—which on conventional computer architectures are a consequence of the von Neumann bottleneck—by implementing performance-critical tasks in hardware.…”

Section: Discussionmentioning

confidence: 99%

A System-on-Chip Based Hybrid Neuromorphic Compute Node Architecture for Reproducible Hyper-Real-Time Simulations of Spiking Neural Networks

Trensch

Morrison

2022

Despite the great strides neuroscience has made in recent decades, the underlying principles of brain function remain largely unknown. Advancing the field strongly depends on the ability to study large-scale neural networks and perform complex simulations. In this context, simulations in hyper-real-time are of high interest, as they would enable both comprehensive parameter scans and the study of slow processes, such as learning and long-term memory. Not even the fastest supercomputer available today is able to meet the challenge of accurate and reproducible simulation with hyper-real acceleration. The development of novel neuromorphic computer architectures holds out promise, but the high costs and long development cycles for application-specific hardware solutions makes it difficult to keep pace with the rapid developments in neuroscience. However, advances in System-on-Chip (SoC) device technology and tools are now providing interesting new design possibilities for application-specific implementations. Here, we present a novel hybrid software-hardware architecture approach for a neuromorphic compute node intended to work in a multi-node cluster configuration. The node design builds on the Xilinx Zynq-7000 SoC device architecture that combines a powerful programmable logic gate array (FPGA) and a dual-core ARM Cortex-A9 processor extension on a single chip. Our proposed architecture makes use of both and takes advantage of their tight coupling. We show that available SoC device technology can be used to build smaller neuromorphic computing clusters that enable hyper-real-time simulation of networks consisting of tens of thousands of neurons, and are thus capable of meeting the high demands for modeling and simulation in neuroscience.