Learning strides in convolutional neural networks

Riad, Rachid; Teboul, Olivier; Grangier, David; Zeghidour, Neil

doi:10.31219/osf.io/4yz8f

Cited by 14 publications

(13 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the emerge of deep neural networks (DNNs) [ [9] , [10] , [11] , [12] , [13] , [14] ], especially convolutional neural networks (CNNs), they leverage multi-level layer neural networks for representational learning and are widely used for image classification [ 15 , 16 ], object detection [ 17 , 18 ] and semantic segmentation [ 19 ]. Naturally, DNNs are very good at detecting COVID-19 [ [20] , [21] , [22] , [23] , [24] , [25] ].…”

Section: Introductionmentioning

confidence: 99%

COVID-19 diagnosis via chest X-ray image classification based on multiscale class residual attention

Liu

Cai

Tang

et al. 2022

Computers in Biology and Medicine

View full text Add to dashboard Cite

Section: Introductionmentioning

confidence: 99%

COVID-19 diagnosis via chest X-ray image classification based on multiscale class residual attention

Liu

Cai

Tang

et al. 2022

Computers in Biology and Medicine

View full text Add to dashboard Cite

“…All the datasets are resampled at a sampling rate of 16 kHz. Following the evaluation protocol in the previous works (Zeghidour et al 2021;Riad et al 2021;Kong et al 2020;Gong, Chung, and Glass 2021b), we report the mean average precision (mAP) as the main evaluation metric on AudioSet and FSD50K, and report classification accuracy (ACC) on other datasets. In all experiments, we use the same architecture as used by Gong, Chung, and Glass (2021b), which is an EfficientNet-B2 (Tan and Le 2019) with four attention heads (13.6 M parameters).…”

Section: Methodsmentioning

confidence: 99%

“…Learning Feature Resolution. One recent work on learning feature resolution for audio classification is Diff-Stride (Riad et al 2021), which learns stride in convolutional neural network (CNN) in a differentiable way and outperforms previous methods using fixed stride settings. By comparison, DiffStride needs to be applied in each CNN layer and can only learn a single fixed stride setting, while DiffRes is a one-layer lightweight algorithm and can personalize the best temporal resolution for each audio during inference.…”

Section: Related Workmentioning

confidence: 99%

Learning Temporal Resolution in Spectrogram for Audio Classification

Liu,

Kong

et al. 2024

AAAI

View full text Add to dashboard Cite

The audio spectrogram is a time-frequency representation that has been widely used for audio classification. One of the key attributes of the audio spectrogram is the temporal resolution, which depends on the hop size used in the Short-Time Fourier Transform (STFT). Previous works generally assume the hop size should be a constant value (e.g., 10 ms). However, a fixed temporal resolution is not always optimal for different types of sound. The temporal resolution affects not only classification accuracy but also computational cost. This paper proposes a novel method, DiffRes, that enables differentiable temporal resolution modeling for audio classification. Given a spectrogram calculated with a fixed hop size, DiffRes merges non-essential time frames while preserving important frames. DiffRes acts as a "drop-in" module between an audio spectrogram and a classifier and can be jointly optimized with the classification task. We evaluate DiffRes on five audio classification tasks, using mel-spectrograms as the acoustic features, followed by off-the-shelf classifier backbones. Compared with previous methods using the fixed temporal resolution, the DiffRes-based method can achieve the equivalent or better classification accuracy with at least 25% computational cost reduction. We further show that DiffRes can improve classification accuracy by increasing the temporal resolution of input acoustic features, without adding to the computational cost.

show abstract

“…Kernel size is the kernel filter's size or the sliding kernel [22]. Stride length is the number of kernels that slide before making product points and creating output pixels [23]. Padding is the size of the 0-th frame set up around the input feature map [24].…”

Section: Fig 3 Convolutional Layermentioning

confidence: 99%

Understanding of Convolutional Neural Network (CNN): A Review

Purwono

Ma’arif

Rahmaniar

et al. 2023

IJRCS

View full text Add to dashboard Cite

The application of deep learning technology has increased rapidly in recent years. Technologies in deep learning increasingly emulate natural human abilities, such as knowledge learning, problem-solving, and decision-making. In general, deep learning can carry out self-training without repetitive programming by humans. Convolutional neural networks (CNNs) are deep learning algorithms commonly used in wide applications. CNN is often used for image classification, segmentation, object detection, video processing, natural language processing, and speech recognition. CNN has four layers: convolution layer, pooling layer, fully connected layer, and non-linear layer. The convolutional layer uses kernel filters to calculate the convolution of the input image by extracting the fundamental features. The pooling layer combines two successive convolutional layers. The third layer is the fully connected layer, commonly called the convolutional output layer. The activation function defines the output of a neural network, such as 'yes' or 'no'. The most common and popular CNN activation functions are Sigmoid, Tanh, ReLU, Leaky ReLU, Noisy ReLU, and Parametric Linear Units. The organization and function of the visual cortex greatly influence CNN architecture because it is designed to resemble the neuronal connections in the human brain. Some of the popular CNN architectures are LeNet, AlexNet and VGGNet.

show abstract

Learning strides in convolutional neural networks

Cited by 14 publications

References 25 publications

COVID-19 diagnosis via chest X-ray image classification based on multiscale class residual attention

COVID-19 diagnosis via chest X-ray image classification based on multiscale class residual attention

Learning Temporal Resolution in Spectrogram for Audio Classification

Understanding of Convolutional Neural Network (CNN): A Review

Contact Info

Product

Resources

About