Sound event detection with binary neural networks on tightly power-constrained IoT devices

Cerutti, G.; Andri, Renzo; Cavigelli, Lukas; Magno, Michele; Farella, Elisabetta; Benini, Luca

doi:10.1145/3370748.3406588

Cited by 35 publications

(26 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The current state-of-the-art XNOR models for the benchmark image datasets are: [23] for MNIST, [24] for CIFAR-10 and CIFAR-100 and [25] for ImageNet. In contrast, we have found [26] to be the only XNOR-Net based work for audio classification. However, that work uses spectrograms as the input to the model which is similar to image classification using XNOR-Net.…”

Section: Binary Conv Layer Figure 5 Typical Convolution Layer Vs Binary Convolution Layermentioning

confidence: 56%

“…To the best of our knowledge, Cerutti et al [26] have presented the only XNOR network for audio classification so far, termed BNN-GAP8. They have used a different benchmark, namely, the less commonly used AudioEvent dataset [43].…”

Section: ) Comparing With Existing Workmentioning

confidence: 99%

“…There are many state-of-the-art XNOR-Net models for different computer vision tasks, such as Rastegari et al [23] for MNIST, Cong [24] for CIFAR-10, and Bulat et al [25] for CIFAR-100 and ImageNet datasets. However, for audio classification, the only work we are aware of is presented by Cerutti et al [26]. This study uses XNOR-Net in combination with spectrogram-based input to effectively perform audio classification via image classification.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Pruning vs XNOR-Net: A Comprehensive Study of Deep Learning for Audio Classification on Edge-Devices

2022

View full text Add to dashboard Cite

Deep learning has celebrated resounding successes in many application areas of relevance to the Internet of Things (IoT), such as computer vision and machine listening. These technologies must ultimately be brought directly to the edge to fully harness the power of deep leaning for the IoT. The obvious challenge is that deep learning techniques can only be implemented on strictly resource-constrained edge devices if the models are radically downsized. This task relies on different model compression techniques, such as network pruning, quantization, and the recent advancement of XNOR-Net. This study examines the suitability of these techniques for audio classification on microcontrollers. We present an application of XNOR-Net for end-to-end raw audio classification and a comprehensive empirical study comparing this approach with pruning-and-quantization methods. We show that raw audio classification with XNOR yields comparable performance to regular full precision networks for small numbers of classes while reducing memory requirements 32-fold and computation requirements 58-fold. However, as the number of classes increases significantly, performance degrades, and pruning-and-quantization based compression techniques take over as the preferred technique being able to satisfy the same space constraints but requiring approximately 8x more computation. We show that these insights are consistent between raw audio classification and image classification using standard benchmark sets. To the best of our knowledge, this is the first study to apply XNOR to end-to-end audio classification and evaluate it in the context of alternative techniques. All codes are publicly available on GitHub.

show abstract

Section: Binary Conv Layer Figure 5 Typical Convolution Layer Vs Binary Convolution Layermentioning

confidence: 56%

Section: ) Comparing With Existing Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Pruning vs XNOR-Net: A Comprehensive Study of Deep Learning for Audio Classification on Edge-Devices

2022

View full text Add to dashboard Cite

show abstract

“…It has been known for some time that quantization of network weights to 5 bits and less is possible without a loss in accuracy in comparison to a 32-bit floating-point baseline model [5], [6], [7]. Further quantization of network weights to binary or ternary precision usually results in a small drop in accuracy, but precision is still adequate for many applications [12], [13], [29], [30]. Extending the approach of extreme quantization to intermediate activations, fully binarized and fully ternarized networks have been proposed [9], [15].…”

Section: A Aggressively Quantized Neural Networkmentioning

confidence: 99%

CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration With Better-Than-Binary Energy Efficiency

Scherer¹,

Rutishauser²,

Cavigelli³

et al. 2022

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

Self Cite

View full text Add to dashboard Cite

We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks. CUTIE, the Completely Unrolled Ternary Inference Engine, focuses on minimizing noncomputational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by 1) a data path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data re-use, 2) targeting ternary neural networks which, in contrast to binary NNs, allow for sparse weights which reduce switching activity, and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity. Compared with state-of-the-art accelerators, CUTIE achieves greater or equal accuracy while decreasing the overall core inference energy cost by a factor of 4.8×-21×.

show abstract

“…The first two BNNs have been used in embedded applications. Cerutti et al presented a BNN for Sound Event Detection on the Freesound database with 28 classes [26]. The audio data is converted to a Mel-frequency cepstral spectrogram and fed to a binary CNN with 5 layers with 3×3 kernels, followed by 3 layers with 1×1 kernels.…”

Section: Accuracy and Energy-efficiency For Various Bnnsmentioning

confidence: 99%

ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

Andri

Karunaratne

Cavigelli

et al. 2021

2021 IEEE International Symposium on Circuits and Systems (ISCAS)

Self Cite

View full text Add to dashboard Cite

Binary Neural Networks enable smart IoT devices, as they significantly reduce the required memory footprint and computational complexity while retaining a high network performance and flexibility. This paper presents ChewBaccaNN, a 0.7 mm 2 sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22 nm technology. By exploiting efficient data re-use, data buffering, latch-based memories, and voltage scaling, a throughput of 241 GOPS is achieved while consuming just 1.1 mW at 0.4V/154MHz during inference of binary CNNs with up to 7×7 kernels, leading to a peak core energy efficiency of 223 TOPS/W. ChewBaccaNN's flexibility allows to run a much wider range of binary CNNs than other accelerators, drastically improving the accuracy-energy tradeoff beyond what can be captured by the TOPS/W metric. In fact, it can perform CIFAR-10 inference at 86.8% accuracy with merely 1.3 µJ, thus exceeding the accuracy while at the same time lowering the energy cost by 2.8× compared to even the most efficient and much larger analog processing-in-memory devices, while keeping the flexibility of running larger CNNs for higher accuracy when needed. It also runs a binary ResNet-18 trained on the 1000-class ILSVRC dataset and improves the energy efficiency by 4.4× over accelerators of similar flexibility. Furthermore, it can perform inference on a binarized ResNet-18 trained with 8bases Group-Net to achieve a 67.5% Top-1 accuracy with only 3.0 mJ/frame-at an accuracy drop of merely 1.8% from the fullprecision ResNet-18.

show abstract

Sound event detection with binary neural networks on tightly power-constrained IoT devices

Cited by 35 publications

References 30 publications

Pruning vs XNOR-Net: A Comprehensive Study of Deep Learning for Audio Classification on Edge-Devices

Pruning vs XNOR-Net: A Comprehensive Study of Deep Learning for Audio Classification on Edge-Devices

CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration With Better-Than-Binary Energy Efficiency

ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

Contact Info

Product

Resources

About