2018
DOI: 10.1109/access.2018.2800728
|View full text |Cite
|
Sign up to set email alerts
|

A Convolutional Neural Network Smartphone App for Real-Time Voice Activity Detection

Abstract: This paper presents a smartphone app that performs real-time voice activity detection based on convolutional neural network. Real-time implementation issues are discussed showing how the slow inference time associated with convolutional neural networks is addressed. The developed smartphone app is meant to act as a switch for noise reduction in the signal processing pipelines of hearing devices, enabling noise estimation or classification to be conducted in noise-only parts of noisy speech signals. The develop… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
46
0
4

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
5

Relationship

1
9

Authors

Journals

citations
Cited by 103 publications
(50 citation statements)
references
References 20 publications
0
46
0
4
Order By: Relevance
“…With the rapid development of advanced communication technology, mobile wireless and Voice over IP (VoIP) are widely used around the world, and speech steganography has increasingly high value covering the secure and covert communication. Speech is a special case of audio signals, and it is different from the typical audio signals in terms of spectral bandwidth, intensity distribution, and signal continuity [4]- [6]. In general, methods designed for audio steganography are not suitable for speech steganography because those methods take the media object as continuous signal and do not consider the speech characteristics.…”
Section: Introductionmentioning
confidence: 99%
“…With the rapid development of advanced communication technology, mobile wireless and Voice over IP (VoIP) are widely used around the world, and speech steganography has increasingly high value covering the secure and covert communication. Speech is a special case of audio signals, and it is different from the typical audio signals in terms of spectral bandwidth, intensity distribution, and signal continuity [4]- [6]. In general, methods designed for audio steganography are not suitable for speech steganography because those methods take the media object as continuous signal and do not consider the speech characteristics.…”
Section: Introductionmentioning
confidence: 99%
“…However, with the popularization of smart devices, the application scenario of deep neural network applications has grown far beyond the high-performance platforms in their infancy. From computer vision (Redmon et al 2016) to image processing (Vardhana et al 2018), from audio analysis (Sehgal and Kehtarnavaz 2018) to natural language processing (Goldberg 2017), various edge portable and lowpower embedded platforms represented by smartphones have gradually become the main processing platforms for deep learning applications. The efficient and timely processing of deep learning applications on these embedded platforms has gradually become an increasingly important optimization design problem in deep learning research and practice.…”
Section: Neural Network Backgroundmentioning
confidence: 99%
“…Noting that multi-core processors are used in modern smartphones, a DNN model can be run on a secondary thread to create the needed computational bandwidth on the main thread to run the app at a desired FPS. This technique was used previously in [11] to allow a DNN model to run on a parallel thread by removing the computation burden from the main audio thread and thus preventing any audio frames from being skipped.…”
Section: Multithreadingmentioning
confidence: 99%