2022
DOI: 10.3390/app12010491
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of the Full-Size Russian Corpus of Internet Drug Reviews with Complex NER Labeling Using Deep Learning Neural Networks and Language Models

Abstract: The paper presents the full-size Russian corpus of Internet users’ reviews on medicines with complex named entity recognition (NER) labeling of pharmaceutically relevant entities. We evaluate the accuracy levels reached on this corpus by a set of advanced deep learning neural networks for extracting mentions of these entities. The corpus markup includes mentions of the following entities: medication (33,005 mentions), adverse drug reaction (1778), disease (17,403), and note (4490). Two of them—medication and d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 46 publications
0
12
0
Order By: Relevance
“…Current study is founded on the dataset Russian Drug Review Corpus (RDRS) [69]. The dataset contains user reviews on medications from the Otzovik.ru site.…”
Section: Methodsmentioning
confidence: 99%
“…Current study is founded on the dataset Russian Drug Review Corpus (RDRS) [69]. The dataset contains user reviews on medications from the Otzovik.ru site.…”
Section: Methodsmentioning
confidence: 99%
“…In this paper, we propose an End-to-End method to the tasks of NER and RE based on a cascade approach, combining the best solutions based on the pre-trained language models (see Section "Materials and Methods"). To fine-tune the models that are parts of the proposed solution, the Russian Drug Review Corpus (RDRS) has been used [15] (see Section "Dataset"). The set of the named entitiy types used for the study includes: Adverse Drug Reaction (ADR), Drug name, Disease name, Indication, and Source of the medication information, that are the basic set for pharmacovigilance purposes.…”
Section: Anton Selivanovmentioning
confidence: 99%
“…The solution to this problem is based on the approach we proposed in [15], that implements multilabel classification of text tokens according to the BIO scheme. This named entity markup scheme assumes that the first token of the entity receives the tag "B-«entity class name»" (beginning of the entity), subsequent ones "I-«entity class name»" (in the PoS(DLCP2022)014 RE from Texts Containing Pharmacologically Significant Information... Anton Selivanov entity), and tokens that aren't included in the entity get tag "O" (out of context).…”
Section: Named Entity Recognitionmentioning
confidence: 99%
See 2 more Smart Citations