Vikram K. Narayana scite author profile

Numerical simulations of brain networks are a critical part of our efforts in understanding brain functions under pathological and normal conditions. For several decades, the community has developed many software packages and simulators to accelerate research in computational neuroscience. In this article, we select the three most popular simulators, as determined by the number of models in the ModelDB database, such as NEURON, GENESIS, and BRIAN, and perform an independent evaluation of these simulators. In addition, we study NEST, one of the lead simulators of the Human Brain Project. First, we study them based on one of the most important characteristics, the range of supported models. Our investigation reveals that brain network simulators may be biased toward supporting a specific set of models. However, all simulators tend to expand the supported range of models by providing a universal environment for the computational study of individual neurons and brain networks. Next, our investigations on the characteristics of computational architecture and efficiency indicate that all simulators compile the most computationally intensive procedures into binary code, with the aim of maximizing their computational performance. However, not all simulators provide the simplest method for module development and/or guarantee efficient binary code. Third, a study of their amenability for high-performance computing reveals that NEST can almost transparently map an existing model on a cluster or multicore computer, while NEURON requires code modification if the model developed for a single computer has to be mapped on a computational cluster. Interestingly, parallelization is the weakest characteristic of BRIAN, which provides no support for cluster computations and limited support for multicore computers. Fourth, we identify the level of user support and frequency of usage for all simulators. Finally, we carry out an evaluation using two case studies: a large network with simplified neural and synaptic models and a small network with detailed models. These two case studies allow us to avoid any bias toward a particular software package. The results indicate that BRIAN provides the most concise language for both cases considered. Furthermore, as expected, NEST mostly favors large network models, while NEURON is better suited for detailed models. Overall, the case studies reinforce our general observation that simulators have a bias in the computational performance toward specific types of the brain network models.

show abstract

The Case for Hybrid Photonic Plasmonic Interconnects (HyPPIs): Low-Latency Energy-and-Area-Efficient On-Chip Interconnects

Sun

Badawy

Narayana

et al. 2015

IEEE Photonics J.

View full text Add to dashboard Cite

Moore's law for traditional electric integrated circuits is facing increasingly more challenges in both physics and economics. Among those challenges is the fact that the bandwidth per compute on the chip is dropping, whereas the energy needed for data movement keeps rising. We benchmark various interconnect technologies, including electrical, photonic, and plasmonic options. We contrast them with hybrid photonic-plasmonic interconnect(s) [HyPPI(s)], where we consider plasmonics for active manipulation devices and photonics for passive propagation integrated circuit elements and further propose another novel hybrid link that utilizes an on-chip laser for intrinsic modulation, thus bypassing electrooptic modulation. Our analysis shows that such hybridization will overcome the shortcomings of both pure photonic and plasmonic links. Furthermore, it shows superiority in a variety of performance parameters such as point-to-point latency, energy efficiency, throughput, energy delay product, crosstalk coupling length, and bit flow density, which is a new metric that we defined to reveal the tradeoff between the footprint and performance. Our proposed HyPPIs show significantly superior performance compared with other links.

show abstract

Hybrid Photonic-Plasmonic Nonblocking Broadband 5 × 5 Router for Optical Networks

et al. 2018

View full text Add to dashboard Cite

Photonic data routing in optical networks is expected overcome the limitations of electronic routers with respect to data rate, latency, and energy consumption. However photonics-based routers suffer from dynamic power consumption, and non-simultaneous usage of multiple wavelength channels when microrings are deployed and are sizable in footprint. Here we show a design for the first hybrid photonic-plasmonic, non-blocking, broadband 5×5 router based on 3-waveguide silicon photonicplasmonic 2×2 switches. The compactness of the router (footprint <200 μm 2 ) results in a short optical propagation delay (0.4ps) enabling high data capacity up to 2 Tbps. The router has an average energy consumption ranging from 0.1~1.0 fJ/bit depending on either DWDM or CDWM operation, enabled by the low electrical capacitance of the switch. The total average routing insertion loss of 2.5 dB is supported via an optical mode hybridization deployed inside the 2×2 switches, which minimizes the coupling losses between the photonic and plasmonic sections of the router. The router's spectral bandwidth resides in the S, C and L bands and exceeds 100 nm supporting WDM applications since no resonance feature are required. Taken together this novel optical router combines multiple design features, all required in next-generation high data-throughput optical networks and computing systems.

show abstract

Reconfiguration and Communication-Aware Task Scheduling for High-Performance Reconfigurable Computing

Huang

Narayana

Simmler

et al. 2010

ACM Trans. Reconfigurable Technol. Syst.

View full text Add to dashboard Cite

High-performance reconfigurable computing involves acceleration of significant portions of an application using reconfigurable hardware. When the hardware tasks of an application cannot simultaneously fit in an FPGA, the task graph needs to be partitioned and scheduled into multiple FPGA configurations, in a way that minimizes the total execution time. This article proposes the Reduced Data Movement Scheduling (RDMS) algorithm that aims to improve the overall performance of hardware tasks by taking into account the reconfiguration time, data dependency between tasks, intertask communication as well as task resource utilization. The proposed algorithm uses the dynamic programming method. A mathematical analysis of the algorithm shows that the execution time would at most exceed the optimal solution by a factor of around 1.6, in the worst-case. Simulations on randomly generated task graphs indicate that RDMS algorithm can reduce interconfiguration communication time by 11% and 44% respectively, compared with two other approaches that consider data dependency and hardware resource utilization only. The practicality, as well as efficiency of the proposed algorithm over other approaches, is demonstrated by simulating a task graph from a real-life application -N-body simulation -along with constraints for bandwidth and FPGA parameters from existing high-performance reconfigurable computers. Experiments on SRC-6 are carried out to validate the approach.

show abstract

GPU Resource Sharing and Virtualization on High Performance Computing Systems

Narayana

El-Araby

et al. 2011

View full text Add to dashboard Cite

Modern Graphic Processing Units (GPUs) are widely used as application accelerators in the High Performance Computing (HPC) field due to their massive floating-point computational capabilities and highly dataparallel computing architecture. Contemporary high performance computers equipped with co-processors such as GPUs primarily execute parallel applications using the Single Program Multiple Data (SPMD) model, which requires balanced computing resources of both microprocessor and coprocessors to ensure full system utilization. While the inclusion of GPUs in HPC systems provides more computing resources and significant performance improvements, the asymmetrical distribution of the number of GPUs relative to the microprocessors can result in an underutilization of overall system computing resources. In this paper, we propose a GPU resource virtualization approach to allow underutilized microprocessors to share the GPUs. We analyze factors affecting the parallel execution performance on GPUs and conduct a theoretical performance estimation based on the most recent GPU architectures as well as the SPMD model. Then we present the implementation details of the virtualization infrastructure, followed by an experimental verification of the proposed concepts using an NVIDIA Fermi GPU computing node. The results demonstrate a considerable performance gain over the traditional SPMD execution without virtualization. Furthermore, the proposed solution enables full utilization of the asymmetrical system resources, through the sharing of the GPUs among microprocessors, while incurring low overheads due to the virtualization layer.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.