2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
DOI: 10.1109/icassp.2017.7952261
|View full text |Cite
|
Sign up to set email alerts
|

Audio Set: An ontology and human-labeled dataset for audio events

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

4
1,288
1
8

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
1
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 2,019 publications
(1,301 citation statements)
references
References 14 publications
4
1,288
1
8
Order By: Relevance
“…VGGish [6] is an SED model, which is trained on AudioSet, a large-scale audio dataset containing 2,084,320 human-labeled 10-second audio clips [25]. We used the VGGish model as a feature extractor to convert the audio input into latent feature vectors and fed them as input to SVR models for emotion prediction.…”
Section: Emotion Recognition Of Wcmed and Ccmed Based On A Sound Evenmentioning
confidence: 99%
“…VGGish [6] is an SED model, which is trained on AudioSet, a large-scale audio dataset containing 2,084,320 human-labeled 10-second audio clips [25]. We used the VGGish model as a feature extractor to convert the audio input into latent feature vectors and fed them as input to SVR models for emotion prediction.…”
Section: Emotion Recognition Of Wcmed and Ccmed Based On A Sound Evenmentioning
confidence: 99%
“…In total we analysed over 2750 hours of audio, collected using a variety of devices including AudioMoths 35 , Tascam recorders, Cornell Lab Swifts, and custom set-ups using commercial microphones (Methods). We then embedded each 0.96 second sample of eco-acoustic data in a 128dimensional feature space using a CNN pre-trained on Google's AudioSet dataset 13,14 .…”
Section: A Common Feature Embedding Yields Multi-scale Ecological Insmentioning
confidence: 99%
“…AudioSet is a collection of human-labelled sound clips, organised in an expanding ontology of audio events, which contains over two million short audio samples drawn from a wide range of sources appearing on YouTube. Although a small amount of eco-acoustic data is present, the vast majority of audio clips are unrelated to natural soundscapes 13 , with the largest classes consisting of music, human speech and machine noise. No ecological acoustic datasets provide labelled data on a similar magnitude to AudioSet, and when detecting 'unknown unknowns' it is in fact desirable to have a feature space that is able to efficiently capture characteristics of non-soundscape specific audio.…”
Section: A Common Feature Embedding Yields Multi-scale Ecological Insmentioning
confidence: 99%
See 2 more Smart Citations