Benchmark Analysis of Representative Deep Neural Network Architectures

Bianco, Simone; Cadène, Rémi; Celona, Luigi; Napoletano, Paolo

doi:10.1109/access.2018.2877890

Cited by 681 publications

(375 citation statements)

References 12 publications

Supporting

Mentioning

354

Contrasting

Unclassified

Order By: Relevance

“…4) Effect of the feature extractor: The effect of the feature extractor for Faster R-CNN is very limited on the AP, except for a high value of IoU threshold (0.9) on the Stanford dataset, as can be seen in Figure 19 and Figure 20. Nevertheless, in terms of inference speed, the Inception-v2 feature extractor is significantly faster than Resnet50 (Figures 21 and 22), which is consistent with the findings of Bianco et al [31] who also showed that Inception-v2 (aka BN-inception) is less computationally complex. 5) Effect of the input size: Figures 21 and 22 show a significant gain in YOLOv3's AP when moving from a 320x320 input size to 416x416, but the performance stagnates when we move further to 608x608, which means that the 416x416 resolution is sufficient to detect the objects of the two datasets.…”

supporting

confidence: 87%

“…On the other hand, the main hyperparameter for Faster R-CNN is the feature extractor. We tested two different feature extractors: Inception-v2 [30] (also called BN-inception in the literature [31]) and Resnet50 [32]. These settings make a total of 5 classifiers that we trained and tested on the two datasets described above, which amounts to 10 experiments that we summarize in Table VI.…”

Section: B Hyperparametersmentioning

confidence: 99%

See 1 more Smart Citation

Aerial Images Processing for Car Detection using Convolutional Neural Networks: Comparison between Faster R-CNN and YoloV3

Ammar¹,

Koubâa²,

Ahmed³

et al. 2019

Preprint

View full text Add to dashboard Cite

In this paper, we address the problem of car detection from aerial images using Convolutional Neural Networks (CNN). This problem presents additional challenges as compared to car (or any object) detection from ground images because features of vehicles from aerial images are more difficult to discern. To investigate this issue, we assess the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN, which is the most popular region-based algorithm, and YOLOv3, which is known to be the fastest detection algorithm. We analyze two datasets with different characteristics to check the impact of various factors, such as UAV's altitude, camera resolution, and object size. The objective of this work is to conduct a robust comparison between these two cutting-edge algorithms. By using a variety of metrics, we show that none of the two algorithms outperforms the other in all cases.

show abstract

supporting

confidence: 87%

Section: B Hyperparametersmentioning

confidence: 99%

Aerial Images Processing for Car Detection using Convolutional Neural Networks: Comparison between Faster R-CNN and YoloV3

Ammar¹,

Koubâa²,

Ahmed³

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…The incorporation of specific platform constraints to such approaches involves modeling how the network architecture relates with the optimization target. As a first step for modeling the performance of embedded CNNs, recent studies have carried out systematic benchmarking on several hardware systems [6,[27][28][29]. Gaining in specificity, an energy estimation methodology for CNN accelerators has been introduced in [30,31].…”

Section: Related Workmentioning

confidence: 99%

PreVIous: A Methodology for Prediction of Visual Inference Performance on IoT Devices

Velasco-Montero

Fernández-Berni

Carmona-Galán

et al. 2020

IEEE Internet Things J.

View full text Add to dashboard Cite

This paper presents PreVIous, a methodology to predict the performance of convolutional neural networks (CNNs) in terms of throughput and energy consumption on vision-enabled devices for the Internet of Things. CNNs typically constitute a massive computational load for such devices, which are characterized by scarce hardware resources to be shared among multiple concurrent tasks. Therefore, it is critical to select the optimal CNN architecture for a particular hardware platform according to prescribed application requirements. However, the zoo of CNN models is already vast and rapidly growing. To facilitate a suitable selection, we introduce a prediction framework that allows to evaluate the performance of CNNs prior to their actual implementation. The proposed methodology is based on PreVIousNet, a neural network specifically designed to build accurate per-layer performance predictive models. PreVIousNet incorporates the most usual parameters found in state-of-the-art network architectures. The resulting predictive models for inference time and energy have been tested against comprehensive characterizations of seven well-known CNN models running on two different software frameworks and two different embedded platforms. To the best of our knowledge, this is the most extensive study in the literature concerning CNN performance prediction on low-power low-cost devices. The average deviation between predictions and real measurements is remarkably low, ranging from 3% to 10%. This means state-of-the-art modeling accuracy. As an additional asset, the fine-grained a priori analysis provided by PreVIous could also be exploited by neural architecture search engines.

show abstract

“…We also compare to well established standard CNN architectures [29,28,13,37,26]. For the accuracies, FLOPs and parameters of standard CNNs, we use the benchmark analysis of Bianco et al [1]. Compared to other standard CNN architecures, the accuracy of our networks are superior to MobileNetv2 [26], GoogleNet [29] and VGG [28].…”

Section: Multi-stage Shiftingmentioning

confidence: 99%

4-Connected Shift Residual Networks

Brown

Mettes

Worring

2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

View full text Add to dashboard Cite

The shift operation was recently introduced as an alternative to spatial convolutions. The operation moves subsets of activations horizontally and/or vertically. Spatial convolutions are then replaced with shift operations followed by point-wise convolutions, significantly reducing computational costs. In this work, we investigate how shifts should best be applied to high accuracy CNNs. We apply shifts of two different neighbourhood groups to ResNet on Ima-geNet: the originally introduced 8-connected (8C) neighbourhood shift and the less well studied 4-connected (4C) neighbourhood shift. We find that when replacing ResNet's spatial convolutions with shifts, both shift neighbourhoods give equal ImageNet accuracy, showing the sufficiency of small neighbourhoods for large images. Interestingly, when incorporating shifts to all point-wise convolutions in residual networks, 4-connected shifts outperform 8-connected shifts. Such a 4-connected shift setup gives the same accuracy as full residual networks while reducing the number of parameters and FLOPs by over 40%. We then highlight that without spatial convolutions, ResNet's downsampling/upsampling bottleneck channel structure is no longer needed. We show a new, 4C shift-based residual network, much shorter than the original ResNet yet with a higher accuracy for the same computational cost. This network is the highest accuracy shift-based network yet shown, demonstrating the potential of shifting in deep neural networks.

show abstract

Benchmark Analysis of Representative Deep Neural Network Architectures

Cited by 681 publications

References 12 publications

Aerial Images Processing for Car Detection using Convolutional Neural Networks: Comparison between Faster R-CNN and YoloV3

Aerial Images Processing for Car Detection using Convolutional Neural Networks: Comparison between Faster R-CNN and YoloV3

PreVIous: A Methodology for Prediction of Visual Inference Performance on IoT Devices

4-Connected Shift Residual Networks

Contact Info

Product

Resources

About