Proceedings of the 2016 International Conference on Parallel Architectures and Compilation 2016
DOI: 10.1145/2967938.2967944
|View full text |Cite
|
Sign up to set email alerts
|

Bridging the Semantic Gaps of GPU Acceleration for Scale-out CNN-based Big Data Processing

Abstract: Convolutional Neural Networks (CNNs) have substantially advanced the state-of-the-art accuracies of object recognition, which is the core function of a myriad of modern multimedia processing techniques such as image/video processing, speech recognition, and natural language processing. GPU-based accelerators gained increasing attention because a large amount of highly parallel neurons in CNN naturally matches the GPU computation pattern. In this work, we perform comprehensive experiments to investigate the per… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 17 publications
(5 citation statements)
references
References 30 publications
0
5
0
Order By: Relevance
“…Although CPUs have been proposed to accelerate CNNs by relying on multicore parallelism and SIMD instructions [14], [15], the number and complexity of the layers in modern CNN models make it very difficult to run the entire network on CPUs. To improve inference throughput, (fast) GPU solutions have been proposed to process a large amount of data [16], [17]. Field Programmable Gate Arrays (FPGAs), on the other hand, have been extensively used as an alternative to this problem as they offer good performance and reconfigurability [18]- [22].…”
Section: The Nmp Architecturementioning
confidence: 99%
“…Although CPUs have been proposed to accelerate CNNs by relying on multicore parallelism and SIMD instructions [14], [15], the number and complexity of the layers in modern CNN models make it very difficult to run the entire network on CPUs. To improve inference throughput, (fast) GPU solutions have been proposed to process a large amount of data [16], [17]. Field Programmable Gate Arrays (FPGAs), on the other hand, have been extensively used as an alternative to this problem as they offer good performance and reconfigurability [18]- [22].…”
Section: The Nmp Architecturementioning
confidence: 99%
“…It collects the whole network information including connection relatives of each layer, layer type, input tensor dimensions, whether the parameters of each layer needs to be updated, memory footprint through automatic inference, GPUs information such as the number of cores, register, shared memory capacity, peak flops. Note that we use the same methods from [10], [17] to estimate the calculation time of each layer and each sub-task, and revise it with profiling the stand-alone running of one iteration since it has been shown that the CNN training has the characteristic of repetitive computation and predictability [34].…”
Section: System Overviewmentioning
confidence: 99%
“…vDNN [52] is a runtime memory manager to handle memory allocation, movement between CPU and GPU memory for DNN workload. Song et al [57] studied on characterizing the performance of GPU acceleration system for CNN applications. They proposed a tuned GPU acceleration framework to handle the gap caused by the uneven computing loads at different CNN layers and fixed computing capacity provisioning.…”
Section: Related Workmentioning
confidence: 99%