Towards transforming FDA adverse event narratives into actionable structured data for improved pharmacovigilance

Wunnava, Susmitha; Qin, Xiao; Kakar, Tabassum; Socrates, Vimig; Wallace, Amber; Rundensteiner, Elke A.

doi:10.1145/3019612.3022875

Cited by 9 publications

(1 citation statement)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…NLP has been widely adapted to accelerate processing time for large data sets such as pharmacovigilance data, electronic health records, and social media data (Wong et al, 2018). A previous study found that rule-based approaches are superior to machine learning approaches for the extraction of demographic variables from FAERS data and suggested using rules that are based on raw text strings over rules that are based on Part-Of-Speech tags of individual tokens for higher performance (Wunnava et al, 2017). Our NLP tool has four algorithms that use rules based on raw text strings; each algorithm is created to extract a demographic variable of interest from the free-text narrative.…”

Section: Natural Language Processing Toolmentioning

confidence: 99%

Evaluation of a natural language processing tool for extracting gender, weight, ethnicity, and race in the US food and drug administration adverse event reporting system

Dang

Kortepeter

et al. 2022

Front. Drug Saf. Regul.

View full text Add to dashboard Cite

The US Food and Drug Administration Adverse Event Reporting System (FAERS) contains over 24 million individual case safety reports (ICSRs). In this research project, we evaluated a natural language processing (NLP) tool’s ability to extract four demographic variables (gender, weight, ethnicity, race) from ICSR narratives. Specificity of the NLP algorithm was over 94% for all demographics, while sensitivity varied between the demographics: 98.6% (gender), 45.5% (weight), 100% (ethnicity), and 85.3% (race). Among ICSRs missing weight, ethnicity, and race in the structured field, few cases had this information in the narrative (>95% missing); consequently, the positive predictive value (PPV) for these three demographics had wide 95% confidence intervals. After NLP implementation, the total number of ICSRs missing gender was reduced by 33% (i.e., NLP identified 472 thousand reports having a gender value in the narrative that was not in the structured field), while the total number of ICSRs missing weight, ethnicity, or race was reduced by less than 4%. This study demonstrated that the implementation of an NLP tool can provide meaningful improvements in the availability of gender information for pharmacovigilance activities conducted with FAERS data. In contrast, NLP tools targeting the extraction of weight, ethnicity, or race from free-text fields have minimal impact largely because the information was infrequently provided by the reporter. Further gains in completeness of these fields must originate from increases in provision of demographic information from the reporter rather than informatic solutions.

show abstract

Section: Natural Language Processing Toolmentioning

confidence: 99%