2022
DOI: 10.1038/s41598-022-17497-1
|View full text |Cite
|
Sign up to set email alerts
|

Gun identification from gunshot audios for secure public places using transformer learning

Abstract: Increased mass shootings and terrorist activities severely impact society mentally and physically. Development of real-time and cost-effective automated weapon detection systems increases a sense of safety in public. Most of the previously proposed methods were vision-based. They visually analyze the presence of a gun in a camera frame. This research focuses on gun-type (rifle, handgun, none) detection based on the audio of its shot. Mel-frequency-based audio features have been used. We compared both convoluti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 44 publications
0
5
0
Order By: Relevance
“…They made use of the 117 audio files that made up the UrbanSound collection. Their method showed the possibility of alternative deep learning architectures by achieving an accuracy of 93.87 percent [5]. In research by Junwoo Park and Youngwoo Cho, distinct audio samples from warfare settings were used to categorize gunshots in video games.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…They made use of the 117 audio files that made up the UrbanSound collection. Their method showed the possibility of alternative deep learning architectures by achieving an accuracy of 93.87 percent [5]. In research by Junwoo Park and Youngwoo Cho, distinct audio samples from warfare settings were used to categorize gunshots in video games.…”
Section: Related Workmentioning
confidence: 99%
“…Each fold serves as the validation set once. This process is repeated k times, and the average performance over all folds is calculated to get the final accuracy [5] [14].…”
Section: Proposed Modelmentioning
confidence: 99%
“…These results were obtained by using no temporal pooling between convolutional layers. This minimizes the input temporal RF of the network down to 7 frames, as illustrated in Figure 8, which means that the CNN output features correspond to input T-F regions with smaller amounts of overlaps 6 . Hence, more temporal resolution is available to SSRP to distinguish salient T-F regions from other less relevant regions.…”
Section: Impact Of Hyper-parametersmentioning
confidence: 99%
“…In recent years, development of Environment Sound Classification (ESC) methods has been one of the hottest topics in audio classification domain due to its potential use in various application areas such as smart cities [1], audio surveillance systems [2], health care [3], security control systems [4], and Internet of Things (IoT) [5]. For example, ESC can be used to automatically identify different environmental sound events, such as gun shots [6], siren [7], and bird sounds [8]. The quite limited knowledge of temporal and frequency characteristics, the lower signal to noise ratio, and less static patterns make ESC more challenging than other audio-related tasks such as music genre classification and speech recognition.…”
Section: Introductionmentioning
confidence: 99%
“…Cybersecurity professionals can use gunfire recognition technology to identify and trace the sound source to prevent future attacks. Furthermore, these technologies can detect gunshot origins in public places and serve as an early warning system for law enforcement agencies [4]. This could help them respond better and quickly to potential threats while ensuring public safety.…”
Section: Introductionmentioning
confidence: 99%