Design space exploration of hardware task superscalar architecture

Yazdanpanah, Fahimeh; Alaei, Mohammad

doi:10.1007/s11227-015-1449-1

Cited by 3 publications

(2 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Before executing a task, it should be analysed using a dependence graph to make sure the correct execution [13,14]. Next, the scheduler determines a target, such as CPU cores, GPUs and FPGAs, for executing the task.…”

Section: -3 Ompss Programming Modelmentioning

confidence: 99%

“…Effective parallel implementation of highcomputation applications with big-data workloads for utilizing the parallel capabilities of both CPUs and GPUs is highly challenging. Using OmpSs programming model, programmers focus on defining application parallelism at different levels [13,14], and concentrate on optimizations considering parallel execution in heterogeneous processors such as CPU-GPU architectures. The run-time system is responsible for parallelism extraction and scheduling.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Heterogeneous Parallel Implementation of PSO Algorithm with an Aging Leader and Challengers

Yazdanpanah

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Parallel computing techniques provide high-performance execution of large-size and real-world complicated problems such as heuristic optimization algorithms. One of the most popular optimization algorithms is PSO (particle swarm optimization) which is an intelligent search method for finding the best solution according to population state. The PSO algorithm has obtained significant attention from researchers and different parallel implementations of this algorithm have been presented for intensive-computing applications. The ALC-PSO algorithm (PSO with an aging leader and challengers) is an improved population-based and intensive-computation procedure thanks to the high fitness analyses. Compared to the traditional PSO, the ALC-PSO algorithm improves convergence rapidity. In this paper, we have developed a novel heterogeneous parallel implementation of the ALC-PSO algorithm using OmpSs and CUDA, for execution on both CPU and GPU cores. OmpSs is a task-based parallel programming model that uses runtime system and hardware for automatic exploiting and managing parallelisms. Combination of OmpSs with CUDA helps developers accelerate their applications running on both CPUs and GPUs. The results demonstrate that the proposed approach (i.e., the OmpSs-CUDA implementation of the ALC-PSO algorithm) provides higher performance than the serial, and also the CUDA-based parallel implementations of the ALC-PSO algorithm.

show abstract

Section: -3 Ompss Programming Modelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Heterogeneous Parallel Implementation of PSO Algorithm with an Aging Leader and Challengers

Yazdanpanah

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

A Survey on Heterogeneous CPU–GPU Architectures and Simulators

Alaei,

Yazdanpanah

2024

Concurrency and Computation

View full text Add to dashboard Cite

Heterogeneous architectures are vastly used in various high performance computing systems from IoT‐based embedded architectures to edge and cloud systems. Although heterogeneous architectures with cooperation of CPUs and GPUs and unified address space are increasingly used, there are still a lot of open questions and challenges regarding the design of these architectures. For evaluation, validation and exploration of next generation of heterogeneous CPU–GPU architectures, it is essential to use unified heterogeneous simulators for analyzing the execution of CPU–GPU workloads. This article presents a systematic review on challenges of heterogeneous CPU–GPU architectures with covering a diverse set of literatures on each challenge. The main considered challenges are shared resource management, network interconnections, task scheduling, energy consumption, and programming model. In addition, in this article, the state‐of‐the‐art of heterogeneous CPU–GPU simulation platforms is reviewed. The structure and characteristics of five cycle‐accurate heterogeneous CPU–GPU simulators are described and compared. We perform comprehensive discussions on the methodologies and challenges of designing high performance heterogeneous architectures. Moreover, for developing efficient heterogeneous CPU–GPU simulators, some recommendations are presented.

show abstract

An approach for low-power heterogeneous parallel implementation of ALC-PSO algorithm using OmpSs and CUDA

Yazdanpanah,

Alaei

2024

Parallel Computing

View full text Add to dashboard Cite

Design space exploration of hardware task superscalar architecture

Cited by 3 publications

References 34 publications

Heterogeneous Parallel Implementation of PSO Algorithm with an Aging Leader and Challengers

Heterogeneous Parallel Implementation of PSO Algorithm with an Aging Leader and Challengers

A Survey on Heterogeneous CPU–GPU Architectures and Simulators

An approach for low-power heterogeneous parallel implementation of ALC-PSO algorithm using OmpSs and CUDA

Contact Info

Product

Resources

About