2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2019
DOI: 10.1109/ipdpsw.2019.00068
|View full text |Cite
|
Sign up to set email alerts
|

Toward an Analytical Performance Model to Select between GPU and CPU Execution

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 29 publications
0
4
0
Order By: Relevance
“…Two co-execution schemes are extensively used in the literature: data partitioning and task partitioning. Tasks partitioning has been carried out offline using performance analytical models [3] or using machine learning-based approaches [6,10,13,17]. OpenABLext [30], an extension of the OpenABL framework, carries out an online partitioning of the workload.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Two co-execution schemes are extensively used in the literature: data partitioning and task partitioning. Tasks partitioning has been carried out offline using performance analytical models [3] or using machine learning-based approaches [6,10,13,17]. OpenABLext [30], an extension of the OpenABL framework, carries out an online partitioning of the workload.…”
Section: Related Workmentioning
confidence: 99%
“…We based the evaluation on an extended version of OpenABL which supports generating OpenCL code. 3 Despite the fact that data transfer overhead was avoided, co-execution was not feasible for the tested applications on the iCPU-GPU system. Similarly to the MV application, this is caused by the large performance deviation between the CPU and the iGPU.…”
Section: Case Study 2: Openablmentioning
confidence: 99%
“…First, we plan to develop an approach that automatically determines the worklist size for a given program and device for ReexeStrategy. Second, it would be worth studying how to select the appropriate device for exploration [Chikin et al 2019]. Third, we plan on finding the optimal memory layout for our worklists [Franco et al 2017;Majeti et al 2016].…”
Section: Limitations and Future Workmentioning
confidence: 99%
“…Here, as in other auto-parallelization the importance of augmenting static analysis with runtime values is recognised, e.g. [9]. This approach compiles parallel CPU and GPU code for the loop nest and uses a staged cost model to select what code to run.…”
Section: Related Workmentioning
confidence: 99%