The Adapteva Epiphany many-core architecture comprises a scalable 2D mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. Whereas such a processor offers high computational energy efficiency and parallel scalability, developing effective programming models that address the unique architecture features has presented many challenges. We present here a distributed shared memory (DSM) model supported in software transparently using C++ templated metaprogramming techniques. The approach offers an extremely simple parallel programming model well suited for the architecture. Initial results are presented that demonstrate the approach and provide insight into the efficiency of the programming model and also the ability of the NoC to support a DSM without explicit control over data movement and localization.The development of solutions for performance-portable code remains an open challenge of great interest in computer science as it is applied to high-performance computing. At issue is not the ability to achieve the maximum theoretical performance for every algorithm comprising a given software package, since this will always require heroic efforts and some degree of architecture-specific customization of software. At present, it is proving difficult to achieve even relatively good performance measured against the capabilities of a given parallel architecture. In some cases, non-portable code is required regardless of performance objectives. The Epiphany processor architecture has provided an example of the challenges faced in parallel programmability that must be addressed to support performance-portable code.The Adapteva Epiphany RISC array architecture [1] is a scalable 2D array of low-power RISC cores with minimal un-core functionality supported by an on-chip 2D mesh network for fast inter-core communication. The Epiphany-III architecture is scalable to 4,096 cores and represents an example of an architecture designed for power-efficiency at extreme on-chip core counts. Processors based on this architecture exhibit good performance/power metrics [2] and scalability via 2D mesh network [3][4], but require a suitable programming model to fully exploit the architecture. A 16-core Epiphany-III processor [5] has been integrated into the Parallella mini-computer platform [6] where the RISC array is supported by a dual-core ARM CPU and asymmetric shared-memory access to off-chip global memory. We have recently published results for threaded MPI [7], an OpenSHMEM programming model for Epiphany [8][9], a hybrid programming model [10], and other advances in runtime performance and interoperability [11].RISC array processors, such as those based on the Epiphany architecture, may offer significant computational power efficiency in the near future with requirements in increased core counts, including long-term plans for exascale platforms. The power efficiency of the Epiphany architecture has been specifically identified as both a guide and prospective architecture for such platforms [12]. The Epi...
Computational and Information Sciences Directorate, ARLApproved for public release; distribution is unlimited. ii REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing the burden, to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports (0704-0188), 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.
The Finite Difference Time Domain Method (FDTD) is a full-wave electromagnetic solution. FDTD is computationally intensive with performance depending critically on optimizations of instruction and memory access patterns. We examine the use of compile-time type selection for the optimization of data layouts for different processors to allow for better software maintainability in the context of rapidly evolving computing architectures. The method employs C++ templated meta-programming to enable to the greatest extent possible the use of an optimizing compiler while maintaining a single-source implementation of the computational kernels.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.