2016
DOI: 10.1016/j.parco.2016.05.006
|View full text |Cite
|
Sign up to set email alerts
|

MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL

Abstract: The OpenCL specification tightly binds a command queue to a specific device. For best performance, the user has to find the ideal queuedevice mapping at command queue creation time, an effort that requires a thorough understanding of the underlying device architectures and kernels in the program. In this paper, we propose to add scheduling attributes to the OpenCL context and command queue objects that can be leveraged by an intelligent runtime scheduler to automatically perform ideal queuedevice mapping. Our … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(13 citation statements)
references
References 22 publications
0
13
0
Order By: Relevance
“…To address the limitations and the idle-cycles introduced by the multi-devices in-order execution mode of OpenCL, a number of frameworks has been proposed. For instance, VirtCL [53], SnuCL [34], PySchedCL [21], FluidiCL [42], Mul-tiCL [2], EngineCL [39] and SOCL [26] focus on single or multi-task level scheduling for standalone or partitioned OpenCL applications. A common denominator of all aforementioned frameworks is the fact that they solely focus on non-managed applications, thereby leaving the area of managed languages unexplored.…”
Section: Opencl Execution Modesmentioning
confidence: 99%
“…To address the limitations and the idle-cycles introduced by the multi-devices in-order execution mode of OpenCL, a number of frameworks has been proposed. For instance, VirtCL [53], SnuCL [34], PySchedCL [21], FluidiCL [42], Mul-tiCL [2], EngineCL [39] and SOCL [26] focus on single or multi-task level scheduling for standalone or partitioned OpenCL applications. A common denominator of all aforementioned frameworks is the fact that they solely focus on non-managed applications, thereby leaving the area of managed languages unexplored.…”
Section: Opencl Execution Modesmentioning
confidence: 99%
“…1 auto kernel = file_read("binomial.cl"); 2 auto samples = 16777216; auto steps = 254; 3 auto steps1 = steps + 1; auto lws = steps1; 4 auto samplesBy4 = samples / 4; 5 auto gws = lws * samplesBy4; 6 vector<cl_float4> in(samplesBy4); 7 vector<cl_float4> out(samplesBy4); 8 9 binomial_init_setup(samplesBy4, in, out); 18 program.in(in); 19 program.out(out); 20 21 program.out_pattern(1, lws); 22 23 program.kernel(kernel, "binomial_opts"); 24 program.arg(0, steps); // positional by index 25 program.arg(in); // aggregate 26 program.arg(out); 27 program.arg(steps1 * sizeof(cl_float4), 28 ecl::Arg::LocalAlloc); 29 program.arg(4, steps * sizeof(cl_float4), 30 ecl::Arg::LocalAlloc); 31 32 engine.use(std::move(program)); 33 34 engine.run(); 35 36 // if (engine.has_errors()) // [Optional lines] 37 // for (auto& err : engine.get_errors()) 38 // show or process errors Listing 1: EngineCL API used in Binomial benchmark.…”
Section: Case 1: Using Only One Devicementioning
confidence: 99%
“…The experiments have been carried out using two different machines to validate both code portability and performance of EngineCL. 1 auto kernel = file_read("nbody.cl"); 2 auto gpu_kernel = file_read("nbody.gpu.cl"); 3 auto phi_kernel_bin = 4 file_read_binary("nbody.phi.cl.bin"); 5 auto bodies = 512000; auto del_t = 0.005f; 6 auto esp_sqr = 500.0f; auto lws = 64; 7 auto gws = bodies; 8 vector<cl_float4> in_pos(bodies); 9 vector<cl_float4> in_vel(bodies); 10 vector<cl_float4> out_pos(bodies); 11 vector<cl_float4> out_vel(bodies); 12 13 nbody_init_setup(bodies, del_t, esp_sqr, in_pos, 14 in_vel, out_pos, out_vel); 15 16 ecl::EngineCL engine; 17 engine.use(ecl::Device(0, 0), 18 ecl::Device(0, 1, phi_kernel_bin), 19 ecl::Device(1, 0, gpu_kernel)); 20 21 engine.work_items(gws, lws); 22 23 auto props = { 0.08, 0.3 }; 24 engine.scheduler(ecl::Scheduler::Static(props)); 25 26 ecl::Program program; 27 program.in(in_pos); 28 program.in(in_vel); 29 program.out(out_pos); 30 program.out(out_vel); 31 32 program.kernel(kernel, "nbody"); 33 program.args(in_pos, in_vel, bodies, del_t, 34 esp_sqr, out_pos, out_vel); 35 36 engine.program(std::move(program)); 37 38 engine.run(); Listing 2: EngineCL API used in NBody benchmark.…”
Section: System Setupmentioning
confidence: 99%
“…Introduced as an open standard, OpenCL is also designed for programming heterogeneous parallel systems. Some extensions exist [29] to enable the average OpenCL programmer to focus on the algorithm design rather than scheduling and to automatically gain performance without sacrificing programmability. After coding and running programs, it's important to evaluate the efficiency, the scalability and the portability of the code by using performance metrics for parallel programs (Def.…”
Section: Fundamental Basis For Parallelizationmentioning
confidence: 99%