A Preliminary Study on Deep-Learning Based Screaming Sound Detection

Zaheer, Md. Zaigham; Kim, Jin Young; Kim, Hyoung‐Gook; Na, Seung You

doi:10.1109/icitcs.2015.7292925

Cited by 14 publications

(8 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Studies before 2015 generally used Gaussian mixture models or support vector machines to classify audio data points [18,19,21]. More recent works have tested deep-learning classifiers like deep Boltzmann machines and deep belief networks [22,23]. Additionally, recent studies have tested robustness to noise, using both artificially generated noise and naturally occurring environmental noise [17,23].…”

Section: Technical Workmentioning

confidence: 99%

“…Previous studies use a wide range of training data. Due to the scarcity of publicly available audio databases, training data sets generally fall into 1 of 2 categories: "self-compiled" [18][19][20][21] or "self-recorded" [17,22,23]. Self-compiled databases use audio samples from sound effect websites, movies, or other sources accessible to researchers.…”

Section: Technical Workmentioning

confidence: 99%

“…These data sets also tend to be narrow in scope, limiting generalizability. For example, Nandwana et al [17] stitched together examples of speech and screaming from 6 male speakers to form 24 continuous recordings, and Zaheer et al [22] recorded 130 scream sounds and 110 "ah" sounds from 60 people in outdoor settings.…”

Section: Technical Workmentioning

confidence: 99%

See 2 more Smart Citations

Detecting Screams From Home Audio Recordings to Identify Tantrums: Exploratory Study Using Transfer Machine Learning

O'Donovan¹,

Sezgın²,

Bambach³

et al. 2020

JMIR Form Res

View full text Add to dashboard Cite

Background Qualitative self- or parent-reports used in assessing children’s behavioral disorders are often inconvenient to collect and can be misleading due to missing information, rater biases, and limited validity. A data-driven approach to quantify behavioral disorders could alleviate these concerns. This study proposes a machine learning approach to identify screams in voice recordings that avoids the need to gather large amounts of clinical data for model training. Objective The goal of this study is to evaluate if a machine learning model trained only on publicly available audio data sets could be used to detect screaming sounds in audio streams captured in an at-home setting. Methods Two sets of audio samples were prepared to evaluate the model: a subset of the publicly available AudioSet data set and a set of audio data extracted from the TV show Supernanny, which was chosen for its similarity to clinical data. Scream events were manually annotated for the Supernanny data, and existing annotations were refined for the AudioSet data. Audio feature extraction was performed with a convolutional neural network pretrained on AudioSet. A gradient-boosted tree model was trained and cross-validated for scream classification on the AudioSet data and then validated independently on the Supernanny audio. Results On the held-out AudioSet clips, the model achieved a receiver operating characteristic (ROC)–area under the curve (AUC) of 0.86. The same model applied to three full episodes of Supernanny audio achieved an ROC-AUC of 0.95 and an average precision (positive predictive value) of 42% despite screams only making up 1.3% (n=92/7166 seconds) of the total run time. Conclusions These results suggest that a scream-detection model trained with publicly available data could be valuable for monitoring clinical recordings and identifying tantrums as opposed to depending on collecting costly privacy-protected clinical data for model training.

show abstract

Section: Technical Workmentioning

confidence: 99%

Section: Technical Workmentioning

confidence: 99%

See 1 more Smart Citation

Detecting Screams From Home Audio Recordings to Identify Tantrums: Exploratory Study Using Transfer Machine Learning

O'Donovan¹,

Sezgın²,

Bambach³

et al. 2020

JMIR Form Res

View full text Add to dashboard Cite

show abstract

“…In more complicated and complex signals such as speech or music where the signal changes its properties over time, it is evidently more meaningful to refer to the altering frequency content over a smaller time interval than an infinite time interval. Spectral Flux E. R. Siebert et al [29], L. Gerosa et al [2],, M. Z. Zaheer et al [14], R. A. Breguet et al [31] Spectral Tilt L. Gerosa et al [2], R. A. Breguet et al [31], C. Zhang et al [25] Spectral Entropy M. Mark et al [21], A. Pillai et al [8] , N. Hayasaka et al [4], W. Liao et al [23] Signal Bandwidth M. Mark et al [21], W. Liao et al [23] Sub-Band Energy Ratio J. H. L. Hansen et al [1], C. Chan et al [22], M. Z. Zaheer et al [14], C. Zhang et al [25] Linear Prediction P. C. Schön et al [27], N. E. O. Connor et al [30] Prosodic Pitch/Fundamental Frequency M. Mark et al [21], L. H. Arnal et al [7] , C. Chan et al [22], L. Gerosa et al [2], J. H. L. Hansen et al [13], M. Z. Zaheer et al [14], K. Kato [19], B. Uzkent et al [20], W. Liao et al [23] Loudness/Intensity L. Gerosa et al [2], K. Kato [19], C. Zhang et al [25] Rhythm/Duration C. Chan et al [22], K. Kato [19], C. Zhang et al [25] Log Energy N. Hayasaka et al [4], W. Huang et al [3] 0.0% 70 | P a g e www.ijacsa.thesai.org …”

Section: B Analysis Of Scream Sound Featuresmentioning

confidence: 99%

“…M.Z. Zaheer et al [14] achieved 100% scream detection accuracy with GMM technique. Another classification technique used by N. Hayasaka et al [4] achieved an accuracy rate of 99% again with GMM.…”

Section: ) Unsupervised Learning Algorithmsmentioning

confidence: 99%

A Review on Scream Classification for Situation Understanding

Nazir¹,

Awais²,

Malik³

et al. 2018

ijacsa

View full text Add to dashboard Cite

Abstract-In our living environment, a non-speech audio signal provides a significant evidence for situation awareness. It also compliments the information obtained from a video signal. In non-speech audio signals, screaming is one of the events in which the people like security guard, care taker and family members are particularly interested in terms of care and surveillance because screams are atomically considered as a sign of danger. Contrary to this concept, this review is particularly targeting automated acoustic systems using non-speech class of scream believing that the screams can further be classified into various classes like happiness, sadness, fear, danger, etc. Inspired by the prevalent scream audio detection and classification field, a taxonomy has been projected to highlight the target applications, significant sound features, classification techniques, and their impact on classification problems in last few decades. This review will assist the researchers for retrieving the most appropriate scream detection and classification technique and acoustic parameters for scream classification that can assist in understanding the vocalization condition of the speaker.

show abstract