2021
DOI: 10.3389/frma.2021.689803
|View full text |Cite
|
Sign up to set email alerts
|

Ensemble of Deep Masked Language Models for Effective Named Entity Recognition in Health and Life Science Corpora

Abstract: The health and life science domains are well known for their wealth of named entities found in large free text corpora, such as scientific literature and electronic health records. To unlock the value of such corpora, named entity recognition (NER) methods are proposed. Inspired by the success of transformer-based pretrained models for NER, we assess how individual and ensemble of deep masked language models perform across corpora of different health and life science domains—biology, chemistry, and medicine—av… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
5
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
4
1
1

Relationship

3
3

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 60 publications
1
5
0
Order By: Relevance
“…Thus, it is unclear how the proposed methodology will generalize to corpora and categories used in other reviews and living evidence knowledge bases. That said, given the strong performance obtained in other corpus types by a similar methodology (31), we believe that it shall generalize well. Second, in our experiments, we fail to explore the full contents of the articles.…”
Section: Discussionsupporting
confidence: 53%
See 2 more Smart Citations
“…Thus, it is unclear how the proposed methodology will generalize to corpora and categories used in other reviews and living evidence knowledge bases. That said, given the strong performance obtained in other corpus types by a similar methodology (31), we believe that it shall generalize well. Second, in our experiments, we fail to explore the full contents of the articles.…”
Section: Discussionsupporting
confidence: 53%
“…Then, at inference time, the classifiers were applied to individual records to predict the publication category as output. Two ensemble strategies were created using these predictions (29,31). The first strategy uses a voting system that takes each classifier output as a vote for a class, while the second considers the sum of the class probabilities attributed by the individual classifiers.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Furthermore, IberLEF 2022 [17] and BioASQ 2022 [116] released their datasets translated into seven other languages, encouraging future contributions to multilingual medical NLP. For the French language, the CAS corpus [27] was used in DEFT [53,[120][121][122][123][124], an annual French-speaking text-mining challenge. The 2020 edition of DEFT involved the automatic annotation of 13 different medical entity types, while the 2021 edition proposed to identify the patient's clinical profile through multilabel classification of diseases using the Medical Subject Headings (MeSH) thesaurus.…”
Section: Shared Tasksmentioning
confidence: 99%
“…Having trained multiple NER models, we use an ensemble strategy based on a majority vote to assign the predictions (Copara et al, 2020b,a;Knafou et al, 2020;Naderi et al, 2021). More in detail, for a given sentence S, three NER models infer their predictions independently.…”
Section: Ensemble Of the Ner Modelsmentioning
confidence: 99%