Limitations of Weak Labels for Embedding and Tagging

Turpault, Nicolas; Serizel, Romain

doi:10.1109/icassp40776.2020.9053160

Cited by 10 publications

(4 citation statements)

References 20 publications

(24 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is referred to as label density noise [77], defined as a measure of the weakness of labels for a given weakly labeled clip. The impact and limitations of weak labels in sound event recognition (SER) are discussed in [32,78]. Audio processing can be done by handling the variable-length clips as is, or by slicing the clips into fixed-length patches.…”

Section: A Characteristicsmentioning

confidence: 99%

FSD50K: An Open Dataset of Human-Labeled Sound Events

Fonseca

Favory

Pons

et al. 2022

IEEE/ACM Trans. Audio Speech Lang. Process.

206

View full text Add to dashboard Cite

Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2M tracks from YouTube videos and encompassing over 500 sound classes. However, AudioSet is not an open dataset as its official release consists of pre-computed audio features. Downloading the original audio tracks can be problematic due to YouTube videos gradually disappearing and usage rights issues. To provide an alternative benchmark dataset and thus foster SER research, we introduce FSD50K, an open dataset containing over 51k audio clips totalling over 100h of audio manually labeled using 200 classes drawn from the AudioSet Ontology. The audio clips are licensed under Creative Commons licenses, making the dataset freely distributable (including waveforms). We provide a detailed description of the FSD50K creation process, tailored to the particularities of Freesound data, including challenges encountered and solutions adopted. We include a comprehensive dataset characterization along with discussion of limitations and key factors to allow its audioinformed usage. Finally, we conduct sound event classification experiments to provide baseline systems as well as insight on the main factors to consider when splitting Freesound audio data for SER. Our goal is to develop a dataset to be widely adopted by the community as a new open benchmark for SER research.

show abstract

Section: A Characteristicsmentioning

confidence: 99%

FSD50K: An Open Dataset of Human-Labeled Sound Events

Fonseca

Favory

Pons

et al. 2022

IEEE/ACM Trans. Audio Speech Lang. Process.

206

View full text Add to dashboard Cite

show abstract

“…Fries et al (2019) used unlabeled cardiac MRI sequences for weakly supervised classification of aortic valve malformations, and Wu et al (2017) proposed using a new migration learning-based multi-instance learning (TMIL) framework to solve the multi-instance migration learning problem with both the source and target tasks containing weak labels. However, research into the application of weakly labeled industrial datasets to regression problems is still in its early stages (Turpault et al, 2020).…”

Section: Related Work 21 Weakly Supervised Learningmentioning

confidence: 99%

A weakly supervised pairwise comparison learning approach for bearing health quantitative evaluation and remaining useful life prediction

Zhao,

Cui,

Yuan

et al. 2023

View full text Add to dashboard Cite

PurposeThe purpose of this paper is to present a weakly supervised learning method to perform health evaluation and predict the remaining useful life (RUL) of rolling bearings.Design/methodology/approachBased on the principle that bearing health degrades with the increase of service time, a weak label qualitative pairing comparison dataset for bearing health is extracted from the original time series monitoring data of bearing. A bearing health indicator (HI) quantitative evaluation model is obtained by training the delicately designed neural network structure with bearing qualitative comparison data between different health statuses. The remaining useful life is then predicted using the bearing health evaluation model and the degradation tolerance threshold. To validate the feasibility, efficiency and superiority of the proposed method, comparison experiments are designed and carried out on a widely used bearing dataset.FindingsThe method achieves the transformation of bearing health from qualitative comparison to quantitative evaluation via a learning algorithm, which is promising in industrial equipment health evaluation and prediction.Originality/valueThe method achieves the transformation of bearing health from qualitative comparison to quantitative evaluation via a learning algorithm, which is promising in industrial equipment health evaluation and prediction.

show abstract

“…The longer the clips, the higher the the so-called label density noise [77] as there is less certainty of where the labeled event is actually happening. The impact and limitations of weak labels in SER are discussed in [32,78]. In the context of deep networks, clips' variable length implies that audio processing must be done either using fixed-length patches or utilizing variablelength inputs.…”

Section: A Characteristicsmentioning

confidence: 99%

FSD50K: An Open Dataset of Human-Labeled Sound Events

Fonseca¹,

Favory²,

Pons³

et al. 2020

Preprint

View full text Add to dashboard Cite

Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on a massive amount of audio tracks from YouTube videos and encompassing over 500 classes of everyday sounds. However, AudioSet is not an open datasetits release consists of pre-computed audio features (instead of waveforms), which limits the adoption of some SER methods. Downloading the original audio tracks is also problematic due to constituent YouTube videos gradually disappearing and usage rights issues, which casts doubts over the suitability of this resource for systems' benchmarking. To provide an alternative benchmark dataset and thus foster SER research, we introduce FSD50K, an open dataset containing over 51k audio clips totalling over 100h of audio manually labeled using 200 classes drawn from the AudioSet Ontology. The audio clips are licensed under Creative Commons licenses, making the dataset freely distributable (including waveforms). We provide a detailed description of the FSD50K creation process, tailored to the particularities of Freesound data, including challenges encountered and solutions adopted. We include a comprehensive dataset characterization along with discussion of limitations and key factors to allow its audio-informed usage. Finally, we conduct sound event classification experiments to provide baseline systems as well as insight on the main factors to consider when splitting Freesound audio data for SER. Our goal is to develop a dataset to be widely adopted by the community as a new open benchmark for SER research.

show abstract

Limitations of Weak Labels for Embedding and Tagging

Cited by 10 publications

References 20 publications

FSD50K: An Open Dataset of Human-Labeled Sound Events

FSD50K: An Open Dataset of Human-Labeled Sound Events

A weakly supervised pairwise comparison learning approach for bearing health quantitative evaluation and remaining useful life prediction

FSD50K: An Open Dataset of Human-Labeled Sound Events

Contact Info

Product

Resources

About