2021
DOI: 10.1038/s41597-021-00937-4
|View full text |Cite
|
Sign up to set email alerts
|

The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms

Abstract: Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. The COUGHVID dataset provides over 25,000 crowdsourced cough recordings representing a wide range of participant ages, genders, geographic locations, and COVID-19 statuses. First, we contribute our open-sourced cough detection algorithm to the research community to assist in data robust… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
142
0
1

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 202 publications
(174 citation statements)
references
References 24 publications
0
142
0
1
Order By: Relevance
“…Moreover, as participants were instructed to produce sustained vowels with a continuous phonation over a certain time, it may introduce discontinuities in the pulmonic airstream in COVID-19 infected participants leading to sporadic, unintended interruptions of phonation when expressed the polysyllabic and the ‘ah’ sounds as compared to the cough sound 29 . Interestingly, as far as we know, most of the studies using voice to classify the presence of COVID-19 have utilized cough sounds as the study features 30 32 . It is therefore worthwhile to further explore speeches and other voice types which may have higher information content and better classification performance than cough sounds per se.…”
Section: Discussionmentioning
confidence: 99%
“…Moreover, as participants were instructed to produce sustained vowels with a continuous phonation over a certain time, it may introduce discontinuities in the pulmonic airstream in COVID-19 infected participants leading to sporadic, unintended interruptions of phonation when expressed the polysyllabic and the ‘ah’ sounds as compared to the cough sound 29 . Interestingly, as far as we know, most of the studies using voice to classify the presence of COVID-19 have utilized cough sounds as the study features 30 32 . It is therefore worthwhile to further explore speeches and other voice types which may have higher information content and better classification performance than cough sounds per se.…”
Section: Discussionmentioning
confidence: 99%
“…Several studies have already explored the usability of voice, cough and breathing for detection and screening of COVID-19 [ [25] , [26] , [27] , [28] , [29] , [30] , [31] ]. Crowdsourced dataset of cough and breathing samples is collected and used for distinguishing between individuals tested positive and negative to COVID-19, as well as participants diagnosed with asthma [ 25 ].…”
Section: Introductionmentioning
confidence: 99%
“…Large-scale crowdsourced cough dataset externally validated and labeled by expert physicians was collected within the COUGHVID study. The labels include a diagnosis, severity level, and existence of audible anomalies in cough sounds, such as dyspnea, wheezing or nasal congestion [ 30 ]. Vocal biomarkers initially designed for detection of Alzheimer's disease were successfully used for identification of COVID-19 from forced cough recordings, showing the ability to almost perfectly detect even the asymptomatic cases [ 26 ].…”
Section: Introductionmentioning
confidence: 99%
“…Three individual datasets, which contained airway symptoms of interests, were curated from laboratorygenerated or public sources. These datasets were from: (1) a study of reading a standard passage scripted with airway symptom productions (Rainbow Passage dataset), (2) a published study of vocal loading tests (Vocal Stress dataset) (9) and (3) a crowdsourcing COVID-19 cough sound project (COUGHVID dataset) (21).…”
Section: Methodsmentioning
confidence: 99%
“…Cough is one most common symptoms in airway disease diagnosis and monitoring. To further evaluate our AI algorithm, a highly heterogeneous dataset of coughs containing more than 20,000 recordings were collected from the COUGHVID crowdsourcing dataset (21). The predictions of the classifiers were already stored in the original COUGHVID data files by (21).…”
Section: Methodsmentioning
confidence: 99%