2022
DOI: 10.31219/osf.io/4yz8f
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning strides in convolutional neural networks

Abstract: Convolutional neural networks typically contain several downsampling operators, such as strided convolutions or pooling layers, that progressively reduce the resolution of intermediate representations. This provides some shift-invariance while reducing the computational complexity of the whole architecture. A critical hyperparameter of such layers is their stride: the integer factor of downsampling. As strides are not differentiable, finding the best configuration either requires cross-validation or discrete o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(13 citation statements)
references
References 25 publications
0
13
0
Order By: Relevance
“…With the emerge of deep neural networks (DNNs) [ [9] , [10] , [11] , [12] , [13] , [14] ], especially convolutional neural networks (CNNs), they leverage multi-level layer neural networks for representational learning and are widely used for image classification [ 15 , 16 ], object detection [ 17 , 18 ] and semantic segmentation [ 19 ]. Naturally, DNNs are very good at detecting COVID-19 [ [20] , [21] , [22] , [23] , [24] , [25] ].…”
Section: Introductionmentioning
confidence: 99%
“…With the emerge of deep neural networks (DNNs) [ [9] , [10] , [11] , [12] , [13] , [14] ], especially convolutional neural networks (CNNs), they leverage multi-level layer neural networks for representational learning and are widely used for image classification [ 15 , 16 ], object detection [ 17 , 18 ] and semantic segmentation [ 19 ]. Naturally, DNNs are very good at detecting COVID-19 [ [20] , [21] , [22] , [23] , [24] , [25] ].…”
Section: Introductionmentioning
confidence: 99%
“…All the datasets are resampled at a sampling rate of 16 kHz. Following the evaluation protocol in the previous works (Zeghidour et al 2021;Riad et al 2021;Kong et al 2020;Gong, Chung, and Glass 2021b), we report the mean average precision (mAP) as the main evaluation metric on AudioSet and FSD50K, and report classification accuracy (ACC) on other datasets. In all experiments, we use the same architecture as used by Gong, Chung, and Glass (2021b), which is an EfficientNet-B2 (Tan and Le 2019) with four attention heads (13.6 M parameters).…”
Section: Methodsmentioning
confidence: 99%
“…Learning Feature Resolution. One recent work on learning feature resolution for audio classification is Diff-Stride (Riad et al 2021), which learns stride in convolutional neural network (CNN) in a differentiable way and outperforms previous methods using fixed stride settings. By comparison, DiffStride needs to be applied in each CNN layer and can only learn a single fixed stride setting, while DiffRes is a one-layer lightweight algorithm and can personalize the best temporal resolution for each audio during inference.…”
Section: Related Workmentioning
confidence: 99%
“…Kernel size is the kernel filter's size or the sliding kernel [22]. Stride length is the number of kernels that slide before making product points and creating output pixels [23]. Padding is the size of the 0-th frame set up around the input feature map [24].…”
Section: Fig 3 Convolutional Layermentioning
confidence: 99%