2018
DOI: 10.1016/j.jbi.2018.10.001
|View full text |Cite
|
Sign up to set email alerts
|

Social media mining for birth defects research: A rule-based, bootstrapping approach to collecting data for rare health-related events on Twitter

Abstract: Background: Although birth defects are the leading cause of infant mortality in the United States, methods for observing human pregnancies with birth defect outcomes are limited. Objective: The primary objectives of this study were (i) to assess whether rare health-related events—in this case, birth defects—are reported on social media, (ii) to design and deploy a natural language processing (NLP) approach for collecting such sparse data from social media, and (iii) to utilize the collected data to discover … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
39
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
4

Relationship

3
5

Authors

Journals

citations
Cited by 37 publications
(40 citation statements)
references
References 27 publications
1
39
0
Order By: Relevance
“…An alternate approach to studying how women utilize the Internet during pregnancy is to examine the online content that women generate. Our prior work focused on gathering pregnancy-related information from a generic social media platform such as Twitter, whereby a cohort of pregnant women was identified via their self-reports of pregnancy and their timelines (all publicly available tweets) during pregnancy were analyzed in a case-control study of birth defects [25][26][27]. Other researchers have identified pregnant women based on their search queries, and determined their time-dependent search queries throughout pregnancy [28].…”
Section: Introductionmentioning
confidence: 99%
“…An alternate approach to studying how women utilize the Internet during pregnancy is to examine the online content that women generate. Our prior work focused on gathering pregnancy-related information from a generic social media platform such as Twitter, whereby a cohort of pregnant women was identified via their self-reports of pregnancy and their timelines (all publicly available tweets) during pregnancy were analyzed in a case-control study of birth defects [25][26][27]. Other researchers have identified pregnant women based on their search queries, and determined their time-dependent search queries throughout pregnancy [28].…”
Section: Introductionmentioning
confidence: 99%
“…We handcrafted 11 regular expressions to retrieve tweets that mention adverse pregnancy outcomes, from a database containing more than 400 million public tweets posted by more than 100,000 users who have announced their pregnancy on Twitter [7] . These query patterns were designed to account for the various ways adverse pregnancy outcomes may be linguistically expressed on social media—for example, reporting a miscarriage or stillbirth through the use of rainbow baby (Pattern 2) or hashtags such as #babyloss, #pregnancyloss, #iam1in4 , or #waveoflight (Pattern 9), learned through an iterative process of manually reviewing tweets matched by other query patterns [8] . Similarly, preterm birth, for example, may be expressed by the user referring to her baby as a preemie (Pattern 4), or by reporting that her baby was born at less than 37 weeks of gestation (Pattern 5) or more than three weeks early (Pattern 7).…”
Section: Experimental Design Materials and Methodsmentioning
confidence: 99%
“…Initially, the query patterns focused on the recall of “outcome” tweets. Given the relatively low prevalence of adverse pregnancy outcomes in the general population, and our related work [8] suggesting that they may be under-reported on Twitter, high-precision query patterns would result in a sparse representation of “outcome” tweets; however, many of the preliminary regular expressions would have led to a high degree of class-imbalanced data [12] for training machine learning algorithms to automatically detect “outcome” tweets. Thus, to balance recall and precision, the final regular expressions required a reference to the user (e.g., I, our ), a child (e.g., daughter, baby ), or birth (e.g., born, welcome ) preceding the mention of an adverse pregnancy outcome, while allowing any number of characters to occur between.…”
Section: Experimental Design Materials and Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Consequently, these disease communities use social media platforms to try to find other patients with similar health problems or expertise about their rare condition, sharing manifold types of information-including symptoms, treatments, side effects, and other diseases and activities-that go beyond what is normally captured in a clinical setting or patient registry [9]. Recently, Klein et al [10] mined Twitter to collect data on rare health-related events reported by patients, and showed that this social media platform was useful for gathering patient-centric information that could be used for future epidemiological analyses. Our hypothesis was that data from RD patient histories posted on social media would capture patients' perspectives of their health status, which may be valuable for research into ways of helping undiagnosed patients by accelerating the timeline to diagnosis and treatment.…”
Section: A Proof-of-concept Study Of Extracting Patient Histories Formentioning
confidence: 99%