2017
DOI: 10.48550/arxiv.1704.04861
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Abstract: We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depthwise separable convolutions to build light weight deep neural networks. We introduce two simple global hyperparameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experime… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

8
4,235
2
13

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 3,910 publications
(5,353 citation statements)
references
References 29 publications
8
4,235
2
13
Order By: Relevance
“…We show that by progressively fine-tuning the models, from the dataset furthest from the target domain to the dataset closest to the target domain, large gains in performance can be achieved in terms of M R −2 on reasonable subset of Caltech (3.7%) and CityPerson (1.5%) without finetuning on the target domain. These improvement hold true for models from all pedestrian detection families that we tested such as Cascade R-CNN [8], Faster RCNN [38] and embedded vision based backbones such as MobileNet [21] Finally, we also compare the generaliztaion ability of CNNs against the recent transformer network (Swin-Transformer) [30]. We illustrate that, despite superior performance of Swin Transformer [30], it struggles when the domain shift is large, in comparison with CNNs.…”
Section: Introductionmentioning
confidence: 76%
See 2 more Smart Citations
“…We show that by progressively fine-tuning the models, from the dataset furthest from the target domain to the dataset closest to the target domain, large gains in performance can be achieved in terms of M R −2 on reasonable subset of Caltech (3.7%) and CityPerson (1.5%) without finetuning on the target domain. These improvement hold true for models from all pedestrian detection families that we tested such as Cascade R-CNN [8], Faster RCNN [38] and embedded vision based backbones such as MobileNet [21] Finally, we also compare the generaliztaion ability of CNNs against the recent transformer network (Swin-Transformer) [30]. We illustrate that, despite superior performance of Swin Transformer [30], it struggles when the domain shift is large, in comparison with CNNs.…”
Section: Introductionmentioning
confidence: 76%
“…In this section, we conducted experiments to show that pre-training on dense and diverse datasets can help a light-weight neural network architecture, MobileNet [21], to achieve competitive results as state-of-the-art detectors, such as CSP, on CityPersons [52] dataset.…”
Section: Application Oriented Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Syamsuri et al [13] proposed a detection system with optimal performance and latency for both personal computers and mobile phones. The authors investigated MobileNet [14], Mobile NASNet [15], and InceptionV3 [16] for resource constrained devices for the development of various applications. Resource utilization comparison is made for memory, CPU, and battery use.…”
Section: Related Workmentioning
confidence: 99%
“…Since m and m * have different activation maps for each channel, we derive the scaling and shifting parameters for each channel independently. That means, we employ the depth-wise convolution [12] to estimate the transforming parameters from the j -th channel of m and m * . Specifically, this procedure can be formulated as follows:…”
Section: Self Pixel-wise Normalizationmentioning
confidence: 99%