2021
DOI: 10.1101/2021.02.09.21251454
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning

Abstract: Abstract. The increase of social media usage across the globe has fueled efforts in digital epidemiology for mining valuable information such as medication use, adverse drug effects and reports of viral infections that directly and indirectly affect population health. Such information can, however, be scarce, hard to find and mostly expressed in very colloquial language. In this work, we focus on a fundamental problem that enables social media mining for disease monitoring. We present and make available SEED, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(16 citation statements)
references
References 49 publications
0
16
0
Order By: Relevance
“…Another relaxation method was to decrease the amount of potential labels to only include top-level categories. 47,49,50,56 Even when grouping and comparing classification scores for the three classification tasks separately, it is not straightforward to determine the best-performing algorithm within each category. Algorithm performance reported on a domain-specific dataset labelled by a group of researchers according to their own annotation guidelines may vary when the same algorithm is tested on completely new and therefore independent data labelled by different persons.…”
Section: Evaluation Of Algorithmsmentioning
confidence: 99%
See 2 more Smart Citations
“…Another relaxation method was to decrease the amount of potential labels to only include top-level categories. 47,49,50,56 Even when grouping and comparing classification scores for the three classification tasks separately, it is not straightforward to determine the best-performing algorithm within each category. Algorithm performance reported on a domain-specific dataset labelled by a group of researchers according to their own annotation guidelines may vary when the same algorithm is tested on completely new and therefore independent data labelled by different persons.…”
Section: Evaluation Of Algorithmsmentioning
confidence: 99%
“…42 Other features of unstructured data that cause challenges are colloquialisms, abbreviations, spelling errors, and other variations that appear in natural language. 41,56,65,66 Chee et al 65 describe heterogeneity not within a data source, but between them. Due to different text lengths and/or languages it becomes hard to achieve knowledgeor domain-transfer between sources such as Twitter and online forum entries.…”
Section: Practical Challenges and Research Gaps That Constitute Barri...mentioning
confidence: 99%
See 1 more Smart Citation
“…They further classified these posts while performing NER to obtain mitigation types (such as distancing, disinfection, personal protective equipment) and detection types (such as symptoms, testing) and analyzed for a certain time period the change of people's sentiments toward masks in these posts. For disease monitoring, Magge et al ( 76 ) built a system to collect symptoms and disease mentions from social media platforms and normalized them to unified medical language system (UMLS) terminology. Using deep learning methods (such as BERT and RoBERTa) that were trained on multiple available corpora (such as TwiMed, MedNorm, DS-NER), they achieved an F 1 -score of 0.86 and 0.75 on DailyStrength and Twitter datasets, respectively.…”
Section: Disease Monitoringmentioning
confidence: 99%
“…Roughly, different studies for symptom identification were concerned with general text that is not related directly to the medical or clinical text. For instance, Magge et al [37] developed a framework called "SEED" for symptom extraction from social media posts. The authors implemented deep learning and transfer learning approach that achieved an f1-score of 85%.…”
Section: Literature Reviewmentioning
confidence: 99%