2021
DOI: 10.1007/978-3-030-77211-6_10
|View full text |Cite
|
Sign up to set email alerts
|

Addressing Extreme Imbalance for Detecting Medications Mentioned in Twitter User Timelines

Abstract: Tweets mentioning medications are valuable for efforts in digital epidemiology to supplement traditional methods of monitoring public health. A major obstacle, however, is to differentiate them from the large majority of tweets on other topics posted in a user's timeline: solving the infamous 'needle in a haystack' problem. While deep learning models have significantly improved classification, their performance and inference processing time remain low on extremely imbalanced corpora where the tweets of interes… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 15 publications
0
5
0
Order By: Relevance
“…At the beginning of the competition, code, documentation and trained models of a baseline extractor were released by the organizers to help participants start their development. Its detailed description and evaluation can be found in ( 12 ); we summarize the details of the system in this section.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…At the beginning of the competition, code, documentation and trained models of a baseline extractor were released by the organizers to help participants start their development. Its detailed description and evaluation can be found in ( 12 ); we summarize the details of the system in this section.…”
Section: Discussionmentioning
confidence: 99%
“…The generic references were manually compiled. In ( 12 ), the authors found that the lexical match approach alone for classification achieved good recall (0.756) but low precision (0.253) due to the ambiguity of some entries in the lexicon. Thus, they improved the performance of the classifier by training a (BERT) Bidirectional Encoder Representations from Transformers model to disambiguate the tweets selected by the lexicon match step.…”
Section: Discussionmentioning
confidence: 99%
“…Previous research ( 2 ) shows that better results are obtained by adding a classification step before medication name extraction to classify whether a tweet contains medication names. In this paper, in addition to classifying the tweet in the first step, we also reversed the order and checked whether the extraction was correct by classifying it with PLM after the drug name was extracted.…”
Section: Methodsmentioning
confidence: 99%
“…For the extractive QA component, we only use the Splinter model, which has the highest performance. In addition, we used the medication name dictionary ( 2 ) from the baseline method provided by the task organizer to exclude data for which none of the tokens of selected spans belong to the dictionary. The lexicon of medication names in the dictionary was tokenized by spaces.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation