OpenCL Extensions

Gaster, Benedict R.; Howes, Lee; Kaeli, David; Mistry, Perhaad; Schaa, Dana

doi:10.1016/b978-0-12-387766-6.00034-7

Cited by 10 publications

(14 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We used 120 image frames, and the image frame sizes were 1440 Â 1080, 1280 Â 720, 800 Â 600, 720 Â 480 and 640 Â 480. In addition, we used OpenMP [8] and OpenCL [9] in order to parallelize feature extraction on the CPU and GPU. Note that, although OpenMP does not require the data copy time within a CPU, it does not provide the detail operations for asymmetric workload assignment between the host CPU core and the remaining CPU cores.…”

Section: Resultsmentioning

confidence: 99%

CPU-GPU hybrid computing for feature extraction from video stream

Lee

Kim

Park

et al. 2014

IEICE Electron. Express

View full text Add to dashboard Cite

In this paper, we propose a way to distribute the video analytics workload into both the CPU and GPU, with a performance prediction model including characteristics of feature extraction from the video stream data. That is, we estimate the total execution time of a CPU-GPU hybrid computing system with the performance prediction model, and determine the optimal workload ratio and how to use the CPU cores for the given workload. Based on experimental results, we confirm that our proposed method can improve the speedups of three typical workload distributions: CPU-only, GPU-only, or CPU-GPU hybrid computing with a 50:50 workload ratio.

show abstract

Section: Resultsmentioning

confidence: 99%

CPU-GPU hybrid computing for feature extraction from video stream

Lee

Kim

Park

et al. 2014

IEICE Electron. Express

View full text Add to dashboard Cite

show abstract

“…This provides the programmer with enough flexibility to choose the best architecture for the given task, or to select the task that optimally exploits the given platform. However, this flexibility comes at the expense of an increased programming complexity [34].…”

Section: Openclmentioning

confidence: 99%

“…For more details on OpenCL, please refer to the specifications [35], the suggested bibliography [34], and available examples, such as [37,38].…”

Section: Main Opencl Conceptsmentioning

confidence: 99%

Nested MC-Based Risk Measurement of Complex Portfolios: Acceleration and Energy Efficiency

Desmettre

Korn

Varela

et al. 2016

Risks

View full text Add to dashboard Cite

Risk analysis and management currently have a strong presence in financial institutions, where high performance and energy efficiency are key requirements for acceleration systems, especially when it comes to intraday analysis. In this regard, we approach the estimation of the widely-employed portfolio risk metrics value-at-risk (VaR) and conditional value-at-risk (cVaR) by means of nested Monte Carlo (MC) simulations. We do so by combining theory and software/hardware implementation. This allows us for the first time to investigate their performance on heterogeneous compute systems and across different compute platforms, namely central processing unit (CPU), many integrated core (MIC) architecture XeonPhi, graphics processing unit (GPU), and field-programmable gate array (FPGA). To this end, the OpenCL framework is employed to generate portable code, and the size of the simulations is scaled in order to evaluate variations in performance. Furthermore, we assess different parallelization schemes, and the targeted platforms are evaluated and compared in terms of runtime and energy efficiency. Our implementation also allowed us to derive a new algorithmic optimization regarding the generation of the required random number sequences. Moreover, we provide specific guidelines on how to properly handle these sequences in portable code, and on how to efficiently implement nested MC-based VaR and cVaR simulations on heterogeneous compute systems.

show abstract

“…Such layers are meant (i) to encourage better partitioning of the problem towards fine-grained granularity and low communication, hence increasing the scalability to fully leverage a large number of CUs when available; and (ii) to potentially support more restricted compute architectures, by not strictly enforcing parallelism among CUs while still ensuring that the device is capable of doing synchronism, which can occur among PEs within each CU [15]. Figure 1 shows four scopes of memory, namely, global, constant, local, and private memories.…”

Section: Hardware Modelmentioning

confidence: 99%