Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design 2020
DOI: 10.1145/3370748.3406588
|View full text |Cite
|
Sign up to set email alerts
|

Sound event detection with binary neural networks on tightly power-constrained IoT devices

Abstract: Sound event detection (SED) is a hot topic in consumer and smart city applications. Existing approaches based on deep neural networks (DNNs) are very effective, but highly demanding in terms of memory, power, and throughput when targeting ultra-low power always-on devices.Latency, availability, cost, and privacy requirements are pushing recent IoT systems to process the data on the node, close to the sensor, with a very limited energy supply, and tight constraints on the memory size and processing capabilities… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
26
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 35 publications
(26 citation statements)
references
References 30 publications
0
26
0
Order By: Relevance
“…The current state-of-the-art XNOR models for the benchmark image datasets are: [23] for MNIST, [24] for CIFAR-10 and CIFAR-100 and [25] for ImageNet. In contrast, we have found [26] to be the only XNOR-Net based work for audio classification. However, that work uses spectrograms as the input to the model which is similar to image classification using XNOR-Net.…”
Section: Binary Conv Layer Figure 5 Typical Convolution Layer Vs Binary Convolution Layermentioning
confidence: 56%
See 2 more Smart Citations
“…The current state-of-the-art XNOR models for the benchmark image datasets are: [23] for MNIST, [24] for CIFAR-10 and CIFAR-100 and [25] for ImageNet. In contrast, we have found [26] to be the only XNOR-Net based work for audio classification. However, that work uses spectrograms as the input to the model which is similar to image classification using XNOR-Net.…”
Section: Binary Conv Layer Figure 5 Typical Convolution Layer Vs Binary Convolution Layermentioning
confidence: 56%
“…To the best of our knowledge, Cerutti et al [26] have presented the only XNOR network for audio classification so far, termed BNN-GAP8. They have used a different benchmark, namely, the less commonly used AudioEvent dataset [43].…”
Section: ) Comparing With Existing Workmentioning
confidence: 99%
See 1 more Smart Citation
“…It has been known for some time that quantization of network weights to 5 bits and less is possible without a loss in accuracy in comparison to a 32-bit floating-point baseline model [5], [6], [7]. Further quantization of network weights to binary or ternary precision usually results in a small drop in accuracy, but precision is still adequate for many applications [12], [13], [29], [30]. Extending the approach of extreme quantization to intermediate activations, fully binarized and fully ternarized networks have been proposed [9], [15].…”
Section: A Aggressively Quantized Neural Networkmentioning
confidence: 99%
“…The first two BNNs have been used in embedded applications. Cerutti et al presented a BNN for Sound Event Detection on the Freesound database with 28 classes [26]. The audio data is converted to a Mel-frequency cepstral spectrogram and fed to a binary CNN with 5 layers with 3×3 kernels, followed by 3 layers with 1×1 kernels.…”
Section: Accuracy and Energy-efficiency For Various Bnnsmentioning
confidence: 99%