2017
DOI: 10.1016/j.dib.2016.11.056
|View full text |Cite
|
Sign up to set email alerts
|

A corpus for mining drug-related knowledge from Twitter chatter: Language models and their utilities

Abstract: In this data article, we present to the data science, natural language processing and public heath communities an unlabeled corpus and a set of language models. We collected the data from Twitter using drug names as keywords, including their common misspelled forms. Using this data, which is rich in drug-related chatter, we developed language models to aid the development of data mining tools and methods in this domain. We generated several models that capture (i) distributed word representations and (ii) prob… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
53
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 50 publications
(53 citation statements)
references
References 10 publications
0
53
0
Order By: Relevance
“…In addition, another corpus, also created by the Gonzalez laboratory, consists of 267,215 Twitter posts. In this corpus, two sets of language models were created to encapsulate “semantic properties” by presenting word tokens as dense vectors and “n‐gram sequences” by capturing sequential patterns . Moreover, TwiMed is one of the most recent corpus, which comprises 1,000 tweets and 1,000 PubMed sentences .…”
Section: Knowledge Discovery For Drug Interaction Using Text Mining Tmentioning
confidence: 99%
See 3 more Smart Citations
“…In addition, another corpus, also created by the Gonzalez laboratory, consists of 267,215 Twitter posts. In this corpus, two sets of language models were created to encapsulate “semantic properties” by presenting word tokens as dense vectors and “n‐gram sequences” by capturing sequential patterns . Moreover, TwiMed is one of the most recent corpus, which comprises 1,000 tweets and 1,000 PubMed sentences .…”
Section: Knowledge Discovery For Drug Interaction Using Text Mining Tmentioning
confidence: 99%
“…The corpora for social media were annotated differently from those in literature. Two corpora, created by the Gonzalez laboratory, were annotated in different scopes . One focused on entity level and another focused on language models.…”
Section: Knowledge Discovery For Drug Interaction Using Text Mining Tmentioning
confidence: 99%
See 2 more Smart Citations
“…Pimpalkhute et al [101] proposed a phonetic spelling variant generator that automatically generates common misspellings given a term. While the system has been used for collecting medication-related chatter from Twitter [102] and personal health messages [103], it may be applied to a variety of other data collection tasks. Data collection from OHCs has not faced similar challenges, as posts are usually categorized/structured, and the strategies employed have been simpler.…”
Section: Social Media Sources Topics and Data Acquisitionmentioning
confidence: 99%