2020
DOI: 10.48550/arxiv.2009.11644
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms

Lara Orlandic,
Tomas Teijeiro,
David Atienza

Abstract: Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. However, there is currently no validated database of cough sounds with which to train such ML models. The COUGHVID dataset provides over 20,000 crowdsourced cough recordings representing a wide range of subject ages, genders, geographic locations, and COVID-19 statuses. First, we filter… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
22
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(23 citation statements)
references
References 14 publications
1
22
0
Order By: Relevance
“…In [25], Orlandic L et al implemented the "COUGHVID" crowdsourced dataset for cough analysis with COVID-19 symptom; More than twenty thousand crowdsourced cough recordings reflected a broad range of topic gender, age, geographic locations, and COVID-19 status was given in the COUGHVID dataset. They have collected a series of 121 cough sounds and 94 no-cough sounds first-hand to train the classifier includes voice, laughter, silence, and various background noises [26].…”
Section: Background Workmentioning
confidence: 99%
“…In [25], Orlandic L et al implemented the "COUGHVID" crowdsourced dataset for cough analysis with COVID-19 symptom; More than twenty thousand crowdsourced cough recordings reflected a broad range of topic gender, age, geographic locations, and COVID-19 status was given in the COUGHVID dataset. They have collected a series of 121 cough sounds and 94 no-cough sounds first-hand to train the classifier includes voice, laughter, silence, and various background noises [26].…”
Section: Background Workmentioning
confidence: 99%
“…This ensures higher quality ground truth labels, avoids potential target leakage into the self-reported data and cough samples due to subliminal effects of an aforeknown diagnosis [30], and eliminates issues related to spectral characteristics of the audio recordings made on different hardware with different software filtering and compression. Other studies crowdsource the data through web or mobile apps, which is a less expensive and time-consuming option that yields much larger datasets, albeit of lesser quality both in the ground truth infection status labels and the audio recordings themselves, [33,34].…”
Section: A Related Workmentioning
confidence: 99%
“…As of the time of writing, only three large cough datasets featuring COVID-19 positive samples were publicly available -the EPFL COUGHVID dataset [34], Coswara [39], and Covid19-Cough [44]. The EPFL dataset comprises of approximately 20000 records.…”
Section: A Open Datasetsmentioning
confidence: 99%
“…During the pandemic, many crowdsourcing platforms (such as COUGHVID 2 [24], COVID Voice Detector 3 , and COVID-19 Sounds App 4 ) have been designed to gather respiratory sound audios from both healthy and COVID-19 positive groups for the research purpose. With these collected datasets, researchers in the artificial intelligence community have started to develop machine learning and deep learning based methods (e.g., [5,12,17,25,27]) for cough classification to detect COVID-19.…”
Section: Introductionmentioning
confidence: 99%