Parallel programming model for the Epiphany many-core coprocessor using threaded MPI

Ross, James A.; Richie, David A.; Park, Song J.; Shires, Dale R.

doi:10.1016/j.micpro.2016.02.006

Cited by 11 publications

(7 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An Epiphany accelerated, complete BLAS library was instantiated by the use of the BLIS framework. The performance of the Matrix-Matrix multiplication kernel achieved was better than in any other implementation before (as to the author's knowledge), when program loading and initialization are not taken into account (which is the standard in pre- [8]). When trying to get a more practical kernel, to be used as a Linux service, the performance gets lower, due to the interprocess communication (which could, most likely, be improved), but gives still an interesting result for a first BLAS implementation.…”

Section: Discussionmentioning

confidence: 77%

“…The idea for the micro-kernel was to use a "SUMMA-like" algorithm [4], that could improve the performance over current implementations (that use Cannon's [5]). The achieved results, for the Matrix-Matrix Multiplication performance, were the best for this platform that are presently known to the author [6] [7] [8] (if the host processing and off-chip data transfer is taken into account).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Generation of the Single Precision BLAS Library for the Parallella Platform, with Epiphany Co-processor Acceleration, Using the BLIS Framework

Tasende¹

2016

2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing,

View full text Add to dashboard Cite

The Parallella is a hybrid computing platform that came into existence as the result of a Kickstarter project by Adapteva. It is composed of the high performance, energy-efficient, manycore architecture, Epiphany chip (used as co-processor) and one Zynq-7000 series chip, which normally runs a regular Linux OS version, serves as the main processor, and implements "glue logic" in its internal FPGA to communicate with the many interfaces in the Parallella.

show abstract

Section: Discussionmentioning

confidence: 77%

Section: Introductionmentioning

confidence: 99%

Generation of the Single Precision BLAS Library for the Parallella Platform, with Epiphany Co-processor Acceleration, Using the BLIS Framework

Tasende¹

2016

2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing,

View full text Add to dashboard Cite

show abstract

“…Conceptually, the greatest challenges for effectively using the Epiphany cores are from the limited SRAM as well as the efficient execution of inter-processor communication primitives. In previous work, we demonstrated the use of a threaded MPI implementation to achieve high performance using a standard parallel programming API for the Epiphany architecture [2], [3]. The OpenSHMEM 1.2 standard provides excellent one-sided communication routines well-suited for Epiphany when executed in a SPMD manner.…”

Section: Introductionmentioning

confidence: 99%

Implementing OpenSHMEM for the Adapteva Epiphany RISC Array Processor

Ross

Richie²

2016

Procedia Computer Science

Self Cite

View full text Add to dashboard Cite

The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). With fully divergent cores capable of MIMD execution, the physical topology and memory-mapped capabilities of the core and network translate well to partitioned global address space (PGAS) parallel programming models. Following an investigation into the use of two-sided communication using threaded MPI, one-sided communication using SHMEM is being explored. Here we present work in progress on the development of an OpenSHMEM 1.2 implementation for the Epiphany architecture.

show abstract

“…This software stack was eventually refactored to provide a direct interface to Epiphany [4] providing more consistent semantics than those found in the eSDK as well as Pthreads support extended to a heterogeneous host-coprocessor platform. These features enabled the development of threaded MPI for Epiphany which provided the first demonstration of high performance benchmarks using a standard parallel programming API for Epiphany [5], [6]. Subsequently, this same software stack has supported the development of the ARL OpenSHMEM for Epiphany for which details will be reported elsewhere.…”

Section: Introductionmentioning

confidence: 99%

Advances in Run-time Performance and Interoperability for the Adapteva Epiphany Coprocessor

Richie¹,

Ross

2016

Procedia Computer Science

Self Cite

View full text Add to dashboard Cite

The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). The architecture presents many features and constraints which contribute to software design challenges for the application developer. Addressing these challenges within the software stack that supports application development is critical to improving productivity and expanding the range of applications for the architecture. We report here on advances that have been made in the COPRTHR-2 software stack targeting the Epiphany architecture that address critical issues identified in previous work. Specifically, we describe improvements that bring greater control and precision to the design of compact compiled binary programs in the context of the limited per-core local memory of the architecture. We describe a new design for run-time support that has been implemented to dramatically improve the program load and execute performance and capabilities. Finally, we describe developments that advance host-coprocessor interoperability to expand the functionality available to the application developer.

show abstract

Parallel programming model for the Epiphany many-core coprocessor using threaded MPI

Cited by 11 publications

References 17 publications

Generation of the Single Precision BLAS Library for the Parallella Platform, with Epiphany Co-processor Acceleration, Using the BLIS Framework

Generation of the Single Precision BLAS Library for the Parallella Platform, with Epiphany Co-processor Acceleration, Using the BLIS Framework

Implementing OpenSHMEM for the Adapteva Epiphany RISC Array Processor

Advances in Run-time Performance and Interoperability for the Adapteva Epiphany Coprocessor

Contact Info

Product

Resources

About