Reverse Offload Programming on Heterogeneous Systems

Chen, Cheng; Yang, Wenxiang; Wang, Fang; Zhao, Dan; Liu, Yang; Deng, Liang; Yang, Chi

doi:10.1109/access.2019.2891740

Cited by 4 publications

(3 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Cha et al [17] proposed Virtual Edge, a new method to promote collaborative vehicular edge computing. A reverse offload model was developed to reduce the overhead of moving data between different memory areas [18]. In [19], Cheng et al studied task offloading strategies and wireless resource allocation in multi-user and multi-MEC server systems based on orthogonal frequency division multiplexing access.…”

Section: Computation Offloadingmentioning

confidence: 99%

Efficient task offloading using particle swarm optimization algorithm in edge computing for industrial internet of things

You

Tang

2021

J Cloud Comp

View full text Add to dashboard Cite

As a new form of computing based on the core technology of cloud computing and built on edge infrastructure, edge computing can handle computing-intensive and delay-sensitive tasks. In mobile edge computing (MEC) assisted by 5G technology, offloading computing tasks of edge devices to the edge servers in edge network can effectively reduce delay. Designing a reasonable task offloading strategy in a resource-constrained multi-user and multi-MEC system to meet users’ needs is a challenge issue. In industrial internet of things (IIoT) environment, considering the rapid increase of industrial edge devices and the heterogenous edge servers, a particle swarm optimization (PSO)-based task offloading strategy is proposed to offload tasks from resource-constrained edge devices to edge servers with energy efficiency and low delay style. A multi-objective optimization problem that considers time delay, energy consumption and task execution cost is proposed. The fitness function of the particle represents the total cost of offloading all tasks to different MEC servers. The offloading strategy based on PSO is compared with the genetic algorithm (GA) and the simulated annealing algorithm (SA) through simulation experiments. The experimental results show that the task offloading strategy based on PSO can reduce the delay of the MEC server, balance the energy consumption of the MEC server, and effectively realize the reasonable resource allocation.

show abstract

Section: Computation Offloadingmentioning

confidence: 99%

Efficient task offloading using particle swarm optimization algorithm in edge computing for industrial internet of things

You

Tang

2021

J Cloud Comp

View full text Add to dashboard Cite

show abstract

“…This replacement library can be shipped in Linux distributions LLVM packages, which lowers the entry barrier for OpenMP offloading, because no vendor-specific SDKs are required. Although implementations for reverse offloading for heterogeneous systems are available [10], we presented, to the best of our knowledge, the first OpenMP implementation which gives the programmer full flexibility for target device offloading from the host system to the accelerator card or vice versa. The OpenMP Offloading evaluation suite presented in the work of Diaz et al [13] was a great support for us in order to improve and validate our offloading implementations for SX-Aurora TSUBASA.…”

Section: Related Workmentioning

confidence: 99%

Evaluating the Performance of OpenMP Offloading on the NEC SX-Aurora TSUBASA Vector Engine

2021

JSFI

View full text Add to dashboard Cite

The NEC SX-Aurora TSUBASA vector engine (VE) follows the tradition of long vector processors for high-performance computing (HPC). The technology combines the vector computing capabilities with the popularity of standard x86 architecture by integrating it as an accelerator.To decrease the burden of code porting for different accelerator types, the OpenMP specification is designed to be single parallel programming model for all of them. Besides the availability of compiler and runtime implementations, the functionality as well as the performance is important for the usability and acceptance of this paradigm. In this work, we present LLVM-based solutions for OpenMP target device offloading from the host to the vector engine and vice versa (reverse offloading). Therefore, we use our source-to-source transformation tool sotoc as well as the native LLVM-VE code path. We assess the functionality and present the first performance numbers of real-world HPC kernels. We discuss the advantages and disadvantage of the different approaches and show that our implementation is competitive to other GPU OpenMP runtime implementations. Our work gives scientific programmers new opportunities and flexibilities for the development of scalable OpenMP offloading applications for SX-Aurora TSUBASA.

show abstract

“…Vesely et al [55] discuss the support of operating system calls in GPGPUs. In addition, Chen et al [56] propose to use the accelerators as a host and the regular processors as accelerators for general purpose work offloading. These works propose extending some capabilities of accelerators (GPUs and Intel Many Integrated Core) to allow a more flexible programming.…”

Section: Related Workmentioning

confidence: 99%

Breaking host-centric management of task-based parallel programming models

Pons¹

View full text Add to dashboard Cite

Heterogeneous platforms had become popular to increase the computational power of the systems within a constrained power budget. They are present in several systems, from embedded platforms and mobile devices to high-end servers and clusters. However, the co-processors are managed following a master-slave model where the general-purpose CPU drives the rest of elements. This management limits the system possibilities as not all application parts are suitable to be executed in an accelerator. This thesis presents different proposals to enhance the usage of co-processors in task-based parallel programming models, which are a powerful tool to easily program applications for heterogeneous platforms. The first proposal enhances the task-based systems with an asynchronous, concurrent, and parameterizable behavior. The improvements go across the full-stack, from the programming model level down to the low-level communications used between the libraries and the co-processors. The evaluation shows that the implemented improvements boost the applications' performance as they can be easily tuned for the running platform. The second proposal adds support for task spawn and synchronization in co-processors. The offloaded tasks can create child tasks that target other architectures or remain inside the co-processor. This allows the programmers to implement applications easily and effectively. The evaluation shows the efficiency of the proposal implementation in terms of latency and power consumption. The results show that applications can increase their performance and optimize their power consumption just moving the task spawn from the host threads to the co-processor. This is thanks to the low-latency task management inside the co-processors, which also reduces the communications between the host and the co-processor. The third proposal extends task-based programming models with concepts of recurrent workloads. The regular task syntax has been extended with new clauses to label the recurrent tasks and provide the needed information to the runtime. The evaluation shows an application programmability increase thanks to the new syntax, which allows the specification of recurrent systems with much less code and better accuracy. Also, the direct management of task repetitions and periods in the co-processors allows an almost zero-latency management that is able to manage any task granularity. Els sistemes heterogenis s'han popularitzat, ja que permeten incrementar la potència de càlcul sense implicar un augment del consum energètic. Aquests sistemes van des de plataformes encastades i dispositius mòbils, fins a servidors i clústers d'altes prestacions. En tots ells, la gestió dels coprocessadors segueix el patró primari-secundari on la unitat de còmput general (CPU, per les seves sigles en anglès) dirigeix la resta d'elements. Aquesta gestió limita les possibilitats dels sistemes i limita les parts de les aplicacions que poden ser executades en els acceleradors. Aquesta tesi presenta diferents propostes per millorar l'ús dels coprocessadors dins dels models de programació paral·lels basats en tasques. Aquests models de programació són una eina molt potent que permet programar fàcilment aplicacions pels sistemes heterogenis. La primera proposta millora els models de programació basats en tasques mitjançant aproximacions asíncrones, concurrents i parametritzables. Les millores són a tots els nivells, des del model de programació fins a les comunicacions a baix nivell entre les llibreries i els coprocessadors. Els resultats de l'avaluació mostren que les millores augmenten el rendiment de les aplicacions perquè permeten adaptar-les fàcilment a les plataformes d'execució. La segona proposta afegeix suport per la creació de tasques i la seva sincronització dins dels coprocessadors. Les tasques enviades als coprocessadors poden crear tasques filles pel mateix coprocessador o per altres elements del sistema. Això flexibilitza i facilita la programació d'aplicacions. L'avaluació mostra l'eficiència de la proposta respecte a la latència i el consum d'energia. Els resultats revelen que les aplicacions poden incrementar el seu rendiment i optimitzar el seu consum energètic creant les tasques directament a dins dels coprocessadors. La millora es deu a la baixa latència de la gestió de tasques dins dels coprocessadors que també suposa una reducció de les comunicacions entre la CPU i el coprocessador. La tercera proposta amplia les capacitats dels models de programació basats en tasques introduint conceptes de sistemes recurrents. La sintaxi bàsica d'una tasca s'amplia amb noves clàusules per distingir les recurrents i proporcionar al runtime la informació necessària. L'avaluació de la proposta mosta una millora en la programabilitat de les aplicacions gràcies a la nova sintaxi. Aquesta permet la creació de sistemes recurrents amb menys codi i amb una precisió major. La gestió directa de les repeticions i períodes de les tasques recurrents dins dels coprocessadors resulta en una latència mínima que permet qualsevol granularitat de tasques.

show abstract

Reverse Offload Programming on Heterogeneous Systems

Cited by 4 publications

References 34 publications

Efficient task offloading using particle swarm optimization algorithm in edge computing for industrial internet of things

Efficient task offloading using particle swarm optimization algorithm in edge computing for industrial internet of things

Evaluating the Performance of OpenMP Offloading on the NEC SX-Aurora TSUBASA Vector Engine

Breaking host-centric management of task-based parallel programming models

Contact Info

Product

Resources

About