CPU-Accelerator Co-Scheduling for CNN Acceleration at the Edge

Kim, Yeongmin; Kong, Joonho; Munir, Arslan

doi:10.1109/access.2020.3039278

Cited by 19 publications

(6 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As for the inference tasks, considering the data as well as the network loading overhead of GPUs (Ma et al 2019), data transferring overhead between GPUs and CPUs, the energy consumption of GPUs, the flexibility of inference tasks with limited parallelism, the high latency of GPUs, GPUs are incompetence compared with CPUs. CPUs are more suitable for deep learning inference workload in many cases (Mittal et al 2021;Kim et al 2019) and numerous studies have been made to optimize and accelerate DNNs on CPUs (de Prado et al 2021;Kim et al 2020;Low et al 2020;Putro et al 2021). Thus, in this paper, we focus on the study of DNNs inference performance on CPUs.…”

Section: Discussionmentioning

confidence: 99%

An architecture-level analysis on deep learning models for low-impact computations

Wang

Yue

et al. 2022

Artif Intell Rev

View full text Add to dashboard Cite

Deep neural networks (DNNs) have made significant achievements in a wide variety of domains. For the deep learning tasks, multiple excellent hardware platforms provide efficient solutions, including graphics processing units (GPUs), central processing units (CPUs), field programmable gate arrays (FPGAs), and application-specific integrated circuit (ASIC). Nonetheless, CPUs outperform other solutions including GPUs in many cases for the inference workload of DNNs with the support of various techniques, such as the high-performance libraries being the basic building blocks for DNNs. Thus, CPUs have been a preferred choice for DNN inference applications, particularly in the low-latency demand scenarios. However, the DNN inference efficiency remains a critical issue, especially when low latency is required under conditions with limited hardware resources, such as embedded systems. At the same time, the hardware features have not been fully exploited for DNNs and there is much room for improvement. To this end, this paper conducts a series of experiments to make a thorough study for the inference workload of prominent state-of-the-art DNN architectures on a single-instruction-multiple-data (SIMD) CPU platform, as well as with widely applicable scopes for multiple hardware platforms. The study goes into depth in DNNs: the CPU kernel-instruction level performance characteristics of DNNs including branches, branch prediction misses, cache misses, etc, and the underlying convolutional computing mechanism at the SIMD level; The thorough layer-wise time consumption details with potential time-cost bottlenecks; And the exhaustive dynamic activation sparsity with exact details on the redundancy of DNNs. The research provides researchers with comprehensive and insightful details, as well as crucial target areas for optimising and improving the efficiency of DNNs at both the hardware and software levels.

show abstract

Section: Discussionmentioning

confidence: 99%

An architecture-level analysis on deep learning models for low-impact computations

Wang

Yue

et al. 2022

Artif Intell Rev

View full text Add to dashboard Cite

show abstract

“…The system presents a high degree of flexibility and supports the dynamic deployment of ML algorithms, which demonstrate an efficient and competitive performance of the proposed hardware to accelerate AI-based inference at the edge. Another example is presented in [112] by Kim et al, where they propose a co-scheduling method to accelerate the convolution layer operations of CNN inferences at the edge by exploiting parallelism in the CNN output channels. The developed FPGA-based prototype presented a global performance improvement of up to 200%, and an energy reduction between 14.9% and 49.7%.…”

Section: Edge-ai Levelsmentioning

confidence: 99%

Green IoT and Edge AI as Key Technological Enablers for a Sustainable Digital Transition towards a Smart Circular Economy: An Industry 5.0 Use Case

Fraga‐Lamas

Lopes

Fernández‐Caramés

2021

Sensors

200

View full text Add to dashboard Cite

Internet of Things (IoT) can help to pave the way to the circular economy and to a more sustainable world by enabling the digitalization of many operations and processes, such as water distribution, preventive maintenance, or smart manufacturing. Paradoxically, IoT technologies and paradigms such as edge computing, although they have a huge potential for the digital transition towards sustainability, they are not yet contributing to the sustainable development of the IoT sector itself. In fact, such a sector has a significant carbon footprint due to the use of scarce raw materials and its energy consumption in manufacturing, operating, and recycling processes. To tackle these issues, the Green IoT (G-IoT) paradigm has emerged as a research area to reduce such carbon footprint; however, its sustainable vision collides directly with the advent of Edge Artificial Intelligence (Edge AI), which imposes the consumption of additional energy. This article deals with this problem by exploring the different aspects that impact the design and development of Edge-AI G-IoT systems. Moreover, it presents a practical Industry 5.0 use case that illustrates the different concepts analyzed throughout the article. Specifically, the proposed scenario consists in an Industry 5.0 smart workshop that looks for improving operator safety and operation tracking. Such an application case makes use of a mist computing architecture composed of AI-enabled IoT nodes. After describing the application case, it is evaluated its energy consumption and it is analyzed the impact on the carbon footprint that it may have on different countries. Overall, this article provides guidelines that will help future developers to face the challenges that will arise when creating the next generation of Edge-AI G-IoT systems.

show abstract

“…Since the greatest computational overhead of CNN is related to the convolutional layers, it is necessary to accelerate the computation using a hardware accelerator. To overcome the overhead issue, there are three options of the accelerators for the implementation of the convolution process in CNN: CPUs (central processing units) that include multiply-and-add instructions [ 27 , 28 , 29 ], GPUs (graphics processing units) that execute massively parallel operations [ 30 , 31 ], and FPGAs (field programmable gate arrays) that implement multiple operators in hardware [ 32 , 33 ]. Furthermore, with the increasing number of parameters in the fully connected layer of CNNs, the model size is growing significantly.…”

Section: Background and Definitionsmentioning

confidence: 99%

Heuristic Method for Minimizing Model Size of CNN by Combining Multiple Pruning Techniques

Danhe

Yamagiwa

Wada

2022

Sensors

View full text Add to dashboard Cite

Network pruning techniques have been widely used for compressing computational and memory intensive deep learning models through removing redundant components of the model. According to the pruning granularity, network pruning can be categorized into structured and unstructured methods. The structured pruning removes the large components in a model such as channels or layers, which might reduce the accuracy. The unstructured pruning directly removes mainly the parameters in a model as well as the redundant channels or layers, which might result in an inadequate pruning. To address the limitations of the pruning methods, this paper proposes a heuristic method for minimizing model size. This paper implements an algorithm to combine both the structured and the unstructured pruning methods while maintaining the target accuracy that is configured by its application. We use network slimming for the structured pruning method and deep compression for the unstructured one. Our method achieves a higher compression ratio than the case when the individual pruning method is applied. To show the effectiveness of our proposed method, this paper evaluates our proposed method with actual state-of-the-art CNN models of VGGNet, ResNet and DenseNet under the CIFAR-10 dataset. This paper discusses the performance of the proposed method with the cases of individual usage of the structured and unstructured pruning methods and then proves that our method achieves better performance with higher compression ratio. In the best case of the VGGNet, our method results in a 13× reduction ratio in the model size, and also gives a 15× reduction ratio regarding the pruning time compared with the brute-force search method.

show abstract

CPU-Accelerator Co-Scheduling for CNN Acceleration at the Edge

Cited by 19 publications

References 14 publications

An architecture-level analysis on deep learning models for low-impact computations

An architecture-level analysis on deep learning models for low-impact computations

Green IoT and Edge AI as Key Technological Enablers for a Sustainable Digital Transition towards a Smart Circular Economy: An Industry 5.0 Use Case

Heuristic Method for Minimizing Model Size of CNN by Combining Multiple Pruning Techniques

Contact Info

Product

Resources

About