Memory Requirements for Convolutional Neural Network Hardware Accelerators

Siu, Kevin; Stuart, Dylan Malone; Mahmoud, Magdi S.; Moshovos, Andreas

doi:10.1109/iiswc.2018.8573527

Cited by 57 publications

(20 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most of the architectures that are used in the literature have many layers and, hence, huge numbers of parameters to store and compute. The ResNet50 architecture has 53 convolutions and one fully connected layer with over 23 million trainable parameters [ 108 ]. performed a detailed analysis on the memory requirements of each model before and after deployment in a hand-held device chip.…”

Section: Discussion and Future Directionmentioning

confidence: 99%

A review of deep learning-based detection methods for COVID-19

Subramanian

Elharrouss

Al‐Maadeed

et al. 2022

Computers in Biology and Medicine

113

View full text Add to dashboard Cite

COVID-19 is a fast-spreading pandemic, and early detection is crucial for stopping the spread of infection. Lung images are used in the detection of coronavirus infection. Chest X-ray (CXR) and computed tomography (CT) images are available for the detection of COVID-19. Deep learning methods have been proven efficient and better performing in many computer vision and medical imaging applications. In the rise of the COVID pandemic, researchers are using deep learning methods to detect coronavirus infection in lung images. In this paper, the currently available deep learning methods that are used to detect coronavirus infection in lung images are surveyed. The available methodologies, public datasets, datasets that are used by each method and evaluation metrics are summarized in this paper to help future researchers. The evaluation metrics that are used by the methods are comprehensively compared.

show abstract

Section: Discussion and Future Directionmentioning

confidence: 99%

A review of deep learning-based detection methods for COVID-19

Subramanian

Elharrouss

Al‐Maadeed

et al. 2022

Computers in Biology and Medicine

113

View full text Add to dashboard Cite

show abstract

“…The experimental analysis shows that this method saves 98% memory as compared to traditional CNN. It also indicates that this methods saves more than 90% of memory compared to state-of-the-art related to Virtual DNNs ( vDNNs) [13]…”

Section: Memory-schedulingmentioning

confidence: 98%

Memory Optimization Techniques in Neural Networks: A Review

Pratheeksha¹,

Pranav²,

Nasreen³

2021

IJEAT

View full text Add to dashboard Cite

Deep neural networks have been continuously evolving towards larger and more complex models to solve challenging problems in the field of AI. The primary bottleneck that restricts new network architectures is memory consumption. Running or training DNNs heavily relies on the hardware (CPUs, GPUs, or FPGA) which are either inadequate in terms of memory or hard-to-extend. This would further make it difficult to scale. In this paper, we review some of the latest memory footprint reduction techniques which would enable faster low model complexity. Additionally, it improves accuracy by increasing the batch size and developing wider and deeper neural networks with the same set of hardware resources. The paper emphasizes on memory optimization methods specific to CNN and RNN training.

show abstract

“…One of the most well adopted paradigms are the single computation engines [8]- [16], due to their balanced trade-off of programmability and performance. Currently, despite the progress in processing unit design, further gain in the attainable performance of such engines is hindered by two main factors: i) memory-bound layers that are dominated by the communication with the external memory [13]- [15], [57]. While embedded platforms provide limited bandwidth [58]- [60], e.g.…”

Section: Challenges Of Fpga-based Cnn Inference Enginesmentioning

confidence: 99%

unzipFPGA: Enhancing FPGA-based CNN Engines with On-the-Fly Weights Generation

Venieris¹,

Fernández-Marqués²,

Lane³

2021

Preprint

View full text Add to dashboard Cite

Single computation engines have become a popular design choice for FPGA-based convolutional neural networks (CNNs) enabling the deployment of diverse models without fabric reconfiguration. This flexibility, however, often comes with significantly reduced performance on memory-bound layers and resource underutilisation due to suboptimal mapping of certain layers on the engine's fixed configuration. In this work, we investigate the implications in terms of CNN engine design for a class of models that introduce a pre-convolution stage to decompress the weights at run time. We refer to these approaches as on-the-fly. To minimise the negative impact of limited bandwidth on memorybound layers, we present a novel hardware component that enables the on-chip on-the-fly generation of weights. We further introduce an input selective processing element (PE) design that balances the load between PEs on suboptimally mapped layers. Finally, we present unzipFPGA, a framework to train on-thefly models and traverse the design space to select the highest performing CNN engine configuration. Quantitative evaluation shows that unzipFPGA yields an average speedup of 2.14× and 71% over optimised status-quo and pruned CNN engines under constrained bandwidth and up to 3.69× higher performance density over the state-of-the-art FPGA-based CNN accelerators.

show abstract

Memory Requirements for Convolutional Neural Network Hardware Accelerators

Cited by 57 publications

References 11 publications

A review of deep learning-based detection methods for COVID-19

A review of deep learning-based detection methods for COVID-19

Memory Optimization Techniques in Neural Networks: A Review

unzipFPGA: Enhancing FPGA-based CNN Engines with On-the-Fly Weights Generation

Contact Info

Product

Resources

About