2020
DOI: 10.1109/access.2020.3022058
|View full text |Cite
|
Sign up to set email alerts
|

Visual Object Detector for Cow Sound Event Detection

Abstract: Sound event detection (SED) is a reasonable choice in a number of application domains including cattle sheds, dense forests, or any dark environments where visual objects are usually concealed or invisible. This study presents an autonomous monitoring system based on sound characteristics developed for welfare management in large cattle farms. Two types of artificial audio datasets are prepared: the cow sound event dataset and the UrbanSound8K dataset, which are then used with various sound object detectors fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
9
1

Relationship

2
8

Authors

Journals

citations
Cited by 21 publications
(11 citation statements)
references
References 26 publications
0
11
0
Order By: Relevance
“…Deep learning-based approaches, on the other hand, typically involve training a neural network on large amounts of annotated audio data and using the trained network to classify audio signals. SED has been applied in various fields, such as identifying bearing faults in noisy industrial environments using deep neural networks [ 8 ], recognizing the cow sound [ 45 , 46 ], and detecting respiratory diseases in patients’ voices [ 9 , 10 ]. More recently, advanced methods such as self-supervised learning [ 43 , 44 ], which uses large amounts of unlabeled audio data for pre-training a model, and multi-task learning [ 47 ], which involves a joint learning approach for sound event detection and localization, have been utilized to improve SED performance.…”
Section: Related Workmentioning
confidence: 99%
“…Deep learning-based approaches, on the other hand, typically involve training a neural network on large amounts of annotated audio data and using the trained network to classify audio signals. SED has been applied in various fields, such as identifying bearing faults in noisy industrial environments using deep neural networks [ 8 ], recognizing the cow sound [ 45 , 46 ], and detecting respiratory diseases in patients’ voices [ 9 , 10 ]. More recently, advanced methods such as self-supervised learning [ 43 , 44 ], which uses large amounts of unlabeled audio data for pre-training a model, and multi-task learning [ 47 ], which involves a joint learning approach for sound event detection and localization, have been utilized to improve SED performance.…”
Section: Related Workmentioning
confidence: 99%
“…The raw audio requires pre-processing in a suitable format before it is input into the deep neural networks for the MIR task. We used the log-Mel spectrogram for both classification and semantic segmentation tasks because it is proven and found efficient representation for audio classification [19], emotion recognition [20], and sound event detection [21,22]. The data pre-processing during training and testing time is designed to address the fixed-size input to the neural network and memory issues in processing the long audio sequence.…”
Section: Audio Representationmentioning
confidence: 99%
“…Strong labels include onset and offset times in the given audio recordings, but weak labels give only the types of audio events. When incomplete information is given, the task is called weakly supervised classification [11][12][13][14][15][16][17]. Using weakly supervised learning with an incompletely labeled dataset might reduce the costs for dataset construction.…”
Section: Introductionmentioning
confidence: 99%