Optimizing Network Traffic for Spiking Neural Network Simulations on Densely Interconnected Many-Core Neuromorphic Platforms

Urgese, Gianvito; Barchi, Francesco; Macii, Enrico; Acquaviva, Andrea

doi:10.1109/tetc.2016.2579605

Cited by 29 publications

(22 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our proposed neuron allocation method based on hypergraph partitioning offers an improvement to approaches employing graph partitioning (Heien et al, 2010) or clustering (Urgese et al, 2016). A hypergraph allows the total communication volume of the simulation to be modeled more accurately than using a normal graph (Devine et al, 2005; Deveci et al, 2015) which enables a better allocation of neurons to computing nodes to reduce connectivity between processes.…”

Section: Discussionmentioning

confidence: 99%

“…HRLSim (Minkovich et al, 2014) suggests assigning neurons based on how tightly connected they are but without implementation details. Urgese et al (2016) present an improvement to the default division of workload policy PACMAN in SpiNNaker (Galluppi et al, 2012). They use spectral clustering to group neurons into sub-populations, where tightly connected groups are kept in the same computational node (process).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Communication Sparsity in Distributed Spiking Neural Network Simulations to Improve Scalability

Fernandez-Musoles

Coca

Richmond

2019

Front. Neuroinform.

View full text Add to dashboard Cite

In the last decade there has been a surge in the number of big science projects interested in achieving a comprehensive understanding of the functions of the brain, using Spiking Neuronal Network (SNN) simulations to aid discovery and experimentation. Such an approach increases the computational demands on SNN simulators: if natural scale brain-size simulations are to be realized, it is necessary to use parallel and distributed models of computing. Communication is recognized as the dominant part of distributed SNN simulations. As the number of computational nodes increases, the proportion of time the simulation spends in useful computing (computational efficiency) is reduced and therefore applies a limit to scalability. This work targets the three phases of communication to improve overall computational efficiency in distributed simulations: implicit synchronization, process handshake and data exchange. We introduce a connectivity-aware allocation of neurons to compute nodes by modeling the SNN as a hypergraph . Partitioning the hypergraph to reduce interprocess communication increases the sparsity of the communication graph. We propose dynamic sparse exchange as an improvement over simple point-to-point exchange on sparse communications. Results show a combined gain when using hypergraph-based allocation and dynamic sparse communication, increasing computational efficiency by up to 40.8 percentage points and reducing simulation time by up to 73%. The findings are applicable to other distributed complex system simulations in which communication is modeled as a graph network.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Communication Sparsity in Distributed Spiking Neural Network Simulations to Improve Scalability

Fernandez-Musoles

Coca

Richmond

2019

Front. Neuroinform.

View full text Add to dashboard Cite

show abstract

“…The kernel of the interconnection among all cores of all chips of the simulator is the router, specifically designed to deliver packets as fast as possible (0.1 µs per hop) [16]. The particular design of the router, despite limitations on the synchronous transmission of packets [17,18], allows transmission of two operative packet types-Multicast (MC) and Point to Point (P2P). The length of these packets can be up to 72 bits and can carry a 32 bits long payload.…”

Section: Spinnaker Networkmentioning

confidence: 99%

Benchmarking a Many-Core Neuromorphic Platform With an MPI-Based DNA Sequence Matching Algorithm

et al. 2019

Self Cite

View full text Add to dashboard Cite

SpiNNaker is a neuromorphic globally asynchronous locally synchronous (GALS) multi-core architecture designed for simulating a spiking neural network (SNN) in real-time. Several studies have shown that neuromorphic platforms allow flexible and efficient simulations of SNN by exploiting the efficient communication infrastructure optimised for transmitting small packets across the many cores of the platform. However, the effectiveness of neuromorphic platforms in executing massively parallel general-purpose algorithms, while promising, is still to be explored. In this paper, we present an implementation of a parallel DNA sequence matching algorithm implemented by using the MPI programming paradigm ported to the SpiNNaker platform. In our implementation, all cores available in the board are configured for executing in parallel an optimised version of the Boyer-Moore (BM) algorithm. Exploiting this application, we benchmarked the SpiNNaker platform in terms of scalability and synchronisation latency. Experimental results indicate that the SpiNNaker parallel architecture allows a linear performance increase with the number of used cores and shows better scalability compared to a general-purpose multi-core computing platform.

show abstract

“…Then in the second step, these logic cores are placed onto physical cores-such process is defined as the core placement (see Figure 1(c)). As multi-chip many-core systems scale up, communication costs would be a concern in these decentralized systems, and partitioning and placement of computation onto cores heavily impact the efficiency of on-chip and off-chip communication [2,46,67,82]. Some work has been proposed to optimize the partitioning in the first step aiming to reduce required communication between logic cores.…”

Section: Introductionmentioning

confidence: 99%

“…Some work has been proposed to optimize the partitioning in the first step aiming to reduce required communication between logic cores. For example, Urgeses et al [82] present a partitioning methodology to optimize network traffic for spiking neural networks on neuromorphic many-core platforms; HyPar [75] searches a partition that minimizes the total communication of DNNs on an accelerator array. When it comes to the second step, there is a series of heuristic-based investigations in mapping applications, especially multi-media workloads, to 2D-mesh NoC architectures [32,33,47,58,64,68,69].…”

Section: Introductionmentioning

confidence: 99%

Core Placement Optimization for Multi-chip Many-core Neural Network Systems with Reinforcement Learning

Deng

et al. 2020

ACM Trans. Des. Autom. Electron. Syst.

View full text Add to dashboard Cite

Multi-chip many-core neural network systems are capable of providing high parallelism benefited from decentralized execution, and they can be scaled to very large systems with reasonable fabrication costs. As multi-chip many-core systems scale up, communication latency related effects will take a more important portion in the system performance. While previous work mainly focuses on the core placement within a single chip, there are two principal issues still unresolved: the communication-related problems caused by the non-uniform, hierarchical on/off-chip communication capability in multi-chip systems, and the scalability of these heuristic-based approaches in a factorially growing search space. To this end, we propose a reinforcement-learning-based method to automatically optimize core placement through deep deterministic policy gradient, taking into account information of the environment by performing a series of trials (i.e., placements) and using convolutional neural networks to extract spatial features of different placements. Experimental results indicate that compared with a naive sequential placement, the proposed method achieves 1.99× increase in throughput and 50.5% reduction in latency; compared with the simulated annealing, an effective technique to approximate the global optima in an extremely large search space, our method improves the throughput by 1.22× and reduces the latency by 18.6%. We further demonstrate that our proposed method is capable to find optimal placements taking advantages of different communication properties caused by different system configurations, and work in a topology-agnostic manner. CCS Concepts: • Computer systems organization → Architectures; Parallel architectures; • Computing methodologies → Reinforcement learning;

show abstract

Optimizing Network Traffic for Spiking Neural Network Simulations on Densely Interconnected Many-Core Neuromorphic Platforms

Cited by 29 publications

References 28 publications

Communication Sparsity in Distributed Spiking Neural Network Simulations to Improve Scalability

Communication Sparsity in Distributed Spiking Neural Network Simulations to Improve Scalability

Benchmarking a Many-Core Neuromorphic Platform With an MPI-Based DNA Sequence Matching Algorithm

Core Placement Optimization for Multi-chip Many-core Neural Network Systems with Reinforcement Learning

Contact Info

Product

Resources

About