ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683147
|View full text |Cite
|
Sign up to set email alerts
|

HCU400: an Annotated Dataset for Exploring Aural Phenomenology through Causal Uncertainty

Abstract: The way we perceive a sound depends on many aspectsits ecological frequency, acoustic features, typicality, and most notably, its identified source. In this paper, we present the HCU400: a dataset of 402 sounds ranging from easily identifiable everyday sounds to intentionally obscured artificial ones. It aims to lower the barrier for the study of aural phenomenology as the largest available audio dataset to include an analysis of causal attribution. Each sample has been annotated with crowd-sourced description… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 16 publications
0
7
0
Order By: Relevance
“…Research into audio recall memorability shows that naming or verbalising sounds (phonological-articulation) can improve recall [22], and accordingly, non-verbal sounds have lower recall than verbal sounds [23]. Emotionality is known to play an important role in memory formation, and the emotional impact of a sound is correlated with the clarity of its perceived source [24]. Human activity is considered to be a positively valenced sound [25], and positive valence improves sound recall [26].…”
Section: Auditory Memorabilitymentioning
confidence: 99%
See 1 more Smart Citation
“…Research into audio recall memorability shows that naming or verbalising sounds (phonological-articulation) can improve recall [22], and accordingly, non-verbal sounds have lower recall than verbal sounds [23]. Emotionality is known to play an important role in memory formation, and the emotional impact of a sound is correlated with the clarity of its perceived source [24]. Human activity is considered to be a positively valenced sound [25], and positive valence improves sound recall [26].…”
Section: Auditory Memorabilitymentioning
confidence: 99%
“…We use the PANNs [38] network to generate audio-tags, labelling the audio as music (giving it a score of 1.0) if a musical tag is present in the top 75% confidence. Hcu and arousal scores are independently predicted with ImageNetpretrained xResNet34 models fine-tuned on spectrograms from the HCU400 dataset [24]. Due limited available options, for familiarity, we use the top audio-tag confidence score of the PANNs [38] network as a proxy (Spearman = 0.305, pval = 4.749e-10 between the two scores in the HCU400 dataset).…”
Section: B Audio Gestaltmentioning
confidence: 99%
“…When choosing a set of data to validate our hypothesis, two aspects are of the utmost importance. Firstly, we are interested in video memorability, and therefore we exclude any image-only or audio-only corpora [16][17][18]. Secondly, we require that every video sample is accompanied by at least one textual description.…”
Section: Datasetsmentioning
confidence: 99%
“…Following the experimental psychology literature [4,19], previous work has defined and quantified causal uncertainty in a dataset called HCU400 [1]. The authors of the work curated a set of approximately 400 sounds that intentionally span the spectrum of source ambiguity (from natural, environmental sounds to artificially synthesized sounds), and obtained crowd-sourced annotations for the labels corresponding to each sound.…”
Section: Related Workmentioning
confidence: 99%
“…To this end, we present a method for changing a sound's causal uncertainty by optimization over perturbations in its acoustic properties. Unlike in the ecological audition or psychology literature, we cannot practically compute causal uncertainty by human label annotation and consensus [1,4,19,23]. Instead, following the proposal in [2], we use the uncertainty of a pre-trained audio classification model released by Google, called YAMNet 1 [15], as a proxy for human causal uncertainty (see Section 2).…”
Section: Introductionmentioning
confidence: 99%