2016
DOI: 10.1016/j.micpro.2016.02.006
|View full text |Cite
|
Sign up to set email alerts
|

Parallel programming model for the Epiphany many-core coprocessor using threaded MPI

Abstract: The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This paper demonstrates an efficient parallel programming model for the Epiphany architecture based on the Message Pass… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 17 publications
0
7
0
Order By: Relevance
“…An Epiphany accelerated, complete BLAS library was instantiated by the use of the BLIS framework. The performance of the Matrix-Matrix multiplication kernel achieved was better than in any other implementation before (as to the author's knowledge), when program loading and initialization are not taken into account (which is the standard in pre- [8]). When trying to get a more practical kernel, to be used as a Linux service, the performance gets lower, due to the interprocess communication (which could, most likely, be improved), but gives still an interesting result for a first BLAS implementation.…”
Section: Discussionmentioning
confidence: 77%
See 1 more Smart Citation
“…An Epiphany accelerated, complete BLAS library was instantiated by the use of the BLIS framework. The performance of the Matrix-Matrix multiplication kernel achieved was better than in any other implementation before (as to the author's knowledge), when program loading and initialization are not taken into account (which is the standard in pre- [8]). When trying to get a more practical kernel, to be used as a Linux service, the performance gets lower, due to the interprocess communication (which could, most likely, be improved), but gives still an interesting result for a first BLAS implementation.…”
Section: Discussionmentioning
confidence: 77%
“…The idea for the micro-kernel was to use a "SUMMA-like" algorithm [4], that could improve the performance over current implementations (that use Cannon's [5]). The achieved results, for the Matrix-Matrix Multiplication performance, were the best for this platform that are presently known to the author [6] [7] [8] (if the host processing and off-chip data transfer is taken into account).…”
Section: Introductionmentioning
confidence: 99%
“…Conceptually, the greatest challenges for effectively using the Epiphany cores are from the limited SRAM as well as the efficient execution of inter-processor communication primitives. In previous work, we demonstrated the use of a threaded MPI implementation to achieve high performance using a standard parallel programming API for the Epiphany architecture [2], [3]. The OpenSHMEM 1.2 standard provides excellent one-sided communication routines well-suited for Epiphany when executed in a SPMD manner.…”
Section: Introductionmentioning
confidence: 99%
“…This software stack was eventually refactored to provide a direct interface to Epiphany [4] providing more consistent semantics than those found in the eSDK as well as Pthreads support extended to a heterogeneous host-coprocessor platform. These features enabled the development of threaded MPI for Epiphany which provided the first demonstration of high performance benchmarks using a standard parallel programming API for Epiphany [5], [6]. Subsequently, this same software stack has supported the development of the ARL OpenSHMEM for Epiphany for which details will be reported elsewhere.…”
Section: Introductionmentioning
confidence: 99%