Seeing Sound

Cartwright, Mark; Seals, Ayanna; Salamon, Justin; Williams, Alex C.; Mikloska, Stefanie; MacConnell, Duncan; Law, Edith; Bello, Juan Pablo; Nov, Oded

doi:10.1145/3134664

Cited by 42 publications

(9 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DHH participants in prior work liked waveforms while recording samples in a lab setting [8]; we explore their value for samples recorded in daily life. Spectrograms show the frequency spectrum over time, are often used for scientifc analyses (e.g., bioacoustics [16]), and can be difcult to interpret for novice hearing users [12,35]. Early work showed frequency information was inadequate for DHH users in a sound identifcation task [54]; we briefy explore DHH participants' opinions of spectrograms for displaying sound activity.…”

Section: Methodsmentioning

confidence: 99%

Toward User-Driven Sound Recognizer Personalization with People Who Are d/Deaf or Hard of Hearing

Goodman

Liu

Jain

et al. 2021

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

View full text Add to dashboard Cite

Automated sound recognition tools can be a useful complement to d/Deaf and hard of hearing (DHH) people's typical communication and environmental awareness strategies. Pre-trained sound recognition models, however, may not meet the diverse needs of individual DHH users. While approaches from human-centered machine learning can enable non-expert users to build their own automated systems, end-user ML solutions that augment human sensory abilities present a unique challenge for users who have sensory disabilities: how can a DHH user, who has difficulty hearing a sound themselves, effectively record samples to train an ML system to recognize that sound? To better understand how DHH users can drive personalization of their own assistive sound recognition tools, we conducted a three-part study with 14 DHH participants: (1) an initial interview and demo of a personalizable sound recognizer, (2) a week-long field study of in situ recording, and (3) a follow-up interview and ideation session. Our results highlight a positive subjective experience when recording and interpreting training data in situ, but we uncover several key pitfalls unique to DHH users---such as inhibited judgement of representative samples due to limited audiological experience. We share implications of these results for the design of recording interfaces and human-the-the-loop systems that can support DHH users to build sound recognizers for their personal needs.

show abstract

Section: Methodsmentioning

confidence: 99%

Toward User-Driven Sound Recognizer Personalization with People Who Are d/Deaf or Hard of Hearing

Goodman

Liu

Jain

et al. 2021

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

View full text Add to dashboard Cite

show abstract

“…Five second long excerpts were extracted from recordings in these collections. To manually label them into the target classes, we enhanced the web-based audio annotator tool [15], so that it can be controlled exclusively by the keyboard. This makes labelling very fast when an excerpt contains just one class (e.g., speech).…”

Section: Datasetmentioning

confidence: 99%

Automatic Segmentation of Ethnomusicological Field Recordings

et al. 2019

View full text Add to dashboard Cite

The article presents a method for segmentation of ethnomusicological field recordings. Field recordings are integral documents of folk music performances captured in the field, and typically contain performances, intertwined with interviews and commentaries. As these are live recordings, captured in non-ideal conditions, they usually contain significant background noise. We present a segmentation method that segments field recordings into individual units labelled as speech, solo singing, choir singing, and instrumentals. Classification is based on convolutional deep networks, and is augmented with a probabilistic approach for segmentation. We describe the dataset gathered for the task and the tools developed for gathering the reference annotations. We outline a deep network architecture based on residual modules for labelling short audio segments and compare it to the more standard feature based approaches, where an improvement in classification accuracy of over 10% was obtained. We also present the SeFiRe segmentation tool that incorporates the presented segmentation method.

show abstract

“…As such, finding methods that both engage citizen scientists and allow the swift and accurate categorisation of complex vocalisations will provide an advantage over the output of a much smaller number of experts labouring alone, while also advancing ecological science in the public sphere. Some of the method for acoustic annotation being explored include the pairing of short snippets of sound with visualisations such as spectrograms [3,8], or just providing visualisations [9]. While this presents an advantage for representing certain species' calls (e.g.…”

Section: Introductionmentioning

confidence: 99%

“…However, many forms of machine learning rely upon human intelligence to provide the pre-labelled datasets that they are trained upon, and the production of these annotated datasets is a time-consuming process [ 2 ]. While crowd-sourced human intelligence offers a potential solution, issues remain in terms of participant accuracy and efficiency [ 3 , 4 ], as well as how to motivate continued involvement in the task. However, citizen science offers additional benefits to ecological projects, such as engaging the public with scientific processes [ 5 ] and conservation agendas [ 6 , 7 ].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Describing the sounds of nature: Using onomatopoeia to classify bird calls for citizen science

Vella

Roe

2021

PLoS ONE

View full text Add to dashboard Cite

Bird call libraries are difficult to collect yet vital for bio-acoustics studies. A potential solution is citizen science labelling of calls. However, acoustic annotation techniques are still relatively undeveloped and in parallel, citizen science initiatives struggle with maintaining participant engagement, while increasing efficiency and accuracy. This study explores the use of an under-utilised and theoretically engaging and intuitive means of sound categorisation: onomatopoeia. To learn if onomatopoeia was a reliable means of categorisation, an online experiment was conducted. Participants sourced from Amazon mTurk (N = 104) ranked how well twelve onomatopoeic words described acoustic recordings of ten native Australian bird calls. Of the ten bird calls, repeated measures ANOVA revealed that five of these had single descriptors ranked significantly higher than all others, while the remaining calls had multiple descriptors that were rated significantly higher than others. Agreement as assessed by Kendall’s W shows that overall, raters agreed regarding the suitability and unsuitability of the descriptors used across all bird calls. Further analysis of the spread of responses using frequency charts confirms this and indicates that agreement on which descriptors were unsuitable was pronounced throughout, and that stronger agreement of suitable singular descriptions was matched with greater rater confidence. This demonstrates that onomatopoeia may be reliably used to classify bird calls by non-expert listeners, adding to the suite of methods used in classification of biological sounds. Interface design implications for acoustic annotation are discussed.

show abstract

Seeing Sound

Cited by 42 publications

References 21 publications

Toward User-Driven Sound Recognizer Personalization with People Who Are d/Deaf or Hard of Hearing

Toward User-Driven Sound Recognizer Personalization with People Who Are d/Deaf or Hard of Hearing

Automatic Segmentation of Ethnomusicological Field Recordings

Describing the sounds of nature: Using onomatopoeia to classify bird calls for citizen science

Contact Info

Product

Resources

About