MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Howard, Andrew; Zhu, Menglong; Chen, Bin; Kalenichenko, Dmitry; Wang, Weijun; Weyand, Tobias; Andreetto, Marco; Adam, Hartwig

doi:10.48550/arxiv.1704.04861

Cited by 3,910 publications

(5,353 citation statements)

References 29 publications

Supporting

Mentioning

4,235

Contrasting

Unclassified

Order By: Relevance

“…We show that by progressively fine-tuning the models, from the dataset furthest from the target domain to the dataset closest to the target domain, large gains in performance can be achieved in terms of M R −2 on reasonable subset of Caltech (3.7%) and CityPerson (1.5%) without finetuning on the target domain. These improvement hold true for models from all pedestrian detection families that we tested such as Cascade R-CNN [8], Faster RCNN [38] and embedded vision based backbones such as MobileNet [21] Finally, we also compare the generaliztaion ability of CNNs against the recent transformer network (Swin-Transformer) [30]. We illustrate that, despite superior performance of Swin Transformer [30], it struggles when the domain shift is large, in comparison with CNNs.…”

Section: Introductionmentioning

confidence: 76%

“…In this section, we conducted experiments to show that pre-training on dense and diverse datasets can help a light-weight neural network architecture, MobileNet [21], to achieve competitive results as state-of-the-art detectors, such as CSP, on CityPersons [52] dataset.…”

Section: Application Oriented Modelsmentioning

confidence: 99%

“…The computational cost and model size of pedestrian detectors are important factors in many real-world applications, such as drones and autonomous driving cars, which require real-time detection and are usually run on limited hardware. To study whether progressive training pipeline is still effective in improving the performance of light-weight backbone, we conduct experiments with a widely used light-weight neural network backbone, MobileNet [21] v2, proposed for embedded and mobile computer vision tasks.…”

Section: Application Oriented Modelsmentioning

confidence: 99%

See 2 more Smart Citations

Pedestrian Detection: Domain Generalization, CNNs, Transformers and Beyond

Hasan¹,

Liao²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

Pedestrian detection is the cornerstone of many vision based applications, starting from object tracking to video surveillance and more recently, autonomous driving. With the rapid development of deep learning in object detection, pedestrian detection has achieved very good performance in traditional single-dataset training and evaluation setting. However, in this study on generalizable pedestrian detectors, we show that, current pedestrian detectors poorly handle even small domain shifts in cross-dataset evaluation. We attribute the limited generalization to two main factors, the method and the current sources of data. Regarding the method, we illustrate that biasness present in the design choices (e.g anchor settings) of current pedestrian detectors are the main contributing factor to the limited generalization. Most modern pedestrian detectors are tailored towards target dataset, where they do achieve high performance in traditional single training and testing pipeline, but suffer a degrade in performance when evaluated through cross-dataset evaluation. Consequently, a general object detector performs better in cross-dataset evaluation compared with state of the art pedestrian detectors, due to its generic design. As for the data, we show that the autonomous driving benchmarks are monotonous in nature, that is, they are not diverse in scenarios and dense in pedestrians. Therefore, benchmarks curated by crawling the web (which contain diverse and dense scenarios), are an efficient source of pre-training for providing a more robust representation. Accordingly, we propose a progressive fine-tuning strategy which improves generalization. Additionally, this work also investigate the recent Transformer Networks as backbones to test generalization. We demonstrate that as of now, CNNs outperform transformer networks in terms of generalization and absorbing large scale datasets for learning robust representation. In conclusion, this paper suggests a paradigm shift towards cross-dataset evaluation, for the next generation of pedestrian detectors. Code and models can be accessed at https://github.com/hasanirtiza/Pedestron.

show abstract

Section: Introductionmentioning

confidence: 76%

Section: Application Oriented Modelsmentioning

confidence: 99%

Section: Application Oriented Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Pedestrian Detection: Domain Generalization, CNNs, Transformers and Beyond

Hasan¹,

Liao²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Syamsuri et al [13] proposed a detection system with optimal performance and latency for both personal computers and mobile phones. The authors investigated MobileNet [14], Mobile NASNet [15], and InceptionV3 [16] for resource constrained devices for the development of various applications. Resource utilization comparison is made for memory, CPU, and battery use.…”

Section: Related Workmentioning

confidence: 99%

EfficientNet-Based Robust Recognition of Peach Plant Diseases in Field Images

Ahmad¹,

Khan²,

Irshad³

et al. 2022

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

Plant diseases are a major cause of degraded fruit quality and yield losses. These losses can be significantly reduced with early detection of diseases to ensure their timely treatment, particularly in developing countries. In this regard, an expert system based on deep learning model where the expert knowledge, particularly the one acquired by plant pathologist, is recursively learned by the system and is applied using a smart phone application for use in the target field environment, is being proposed. In this paper, a robust disease detection method is developed based on convolutional neural network (CNN), where its powerful features extraction capabilities are leveraged to detect diseases in images of fruits and leaves. The features extraction pipelines of several state-of-the-art pretrained networks are fine-tuned to achieve optimal detection performance. A novel dataset is collected from peach orchards and extensively augmented using both label-preserving and non-label-preserving transformations. The augmented dataset is used to study the effects of finetuning the pretrained networks' feature extraction pipeline as opposed to keeping the network parameters unchanged. The CNN models, particularly EfficientNet exhibited superior performance on the target dataset once their feature extraction pipelines are fine-tuned. The optimal model is able to achieve 96.6% average accuracy, 90% sensitivity and precision, and 98% specificity on the test set of images.

show abstract

“…Since m and m * have different activation maps for each channel, we derive the scaling and shifting parameters for each channel independently. That means, we employ the depth-wise convolution [12] to estimate the transforming parameters from the j -th channel of m and m * . Specifically, this procedure can be formulated as follows:…”

Section: Self Pixel-wise Normalizationmentioning

confidence: 99%

Image Generation with Self Pixel-wise Normalization

Yeo¹,

Sagong²,

Park³

et al. 2022

Preprint

View full text Add to dashboard Cite

Region-adaptive normalization (RAN) methods have been widely used in the generative adversarial network (GAN)-based image-to-image translation technique. However, since these approaches need a mask image to infer the pixel-wise affine transformation parameters, they cannot be applied to the general image generation models having no paired mask images. To resolve this problem, this paper presents a novel normalization method, called self pixel-wise normalization (SPN), which effectively boosts the generative performance by performing the pixel-adaptive affine transformation without the mask image. In our method, the transforming parameters are derived from a self-latent mask that divides the feature map into the foreground and background regions. The visualization of the selflatent masks shows that SPN effectively captures a sin-

show abstract

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Cited by 3,910 publications

References 29 publications

Pedestrian Detection: Domain Generalization, CNNs, Transformers and Beyond

Pedestrian Detection: Domain Generalization, CNNs, Transformers and Beyond

EfficientNet-Based Robust Recognition of Peach Plant Diseases in Field Images

Image Generation with Self Pixel-wise Normalization

Contact Info

Product

Resources

About