Ensemble of deep masked language models for effective named entity recognition in multi-domain corpora

Naderi, Nona; Knafou, Julien; Copara, Jenny; Ruch, Patrick; Teodoro, Douglas

doi:10.1101/2021.04.26.21256038

Cited by 3 publications

(3 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The majority of research work cited above was proposed for text written in English or Chinese. Few studies were proposed on French corpora [47,48,22,3,49]. [47] proposed a rule-based system for medication.…”

Section: Statistical Clinical Ner Methods Have Been Widely Usedmentioning

confidence: 99%

“…These three research works used private clinical annotated dataset. [22] and [49] used a publicly available dataset, provided in the context of DEFT 2020 [50] and that consists of a collection of French clinical cases. [22] proposed two models: a layered Bi-LSTM-CRF model combined with the language model CamemBERT [51], a French version of BERT and a Greedy NER model.…”

Section: Statistical Clinical Ner Methods Have Been Widely Usedmentioning

confidence: 99%

“…[22] proposed two models: a layered Bi-LSTM-CRF model combined with the language model CamemBERT [51], a French version of BERT and a Greedy NER model. [49] evaluated an ensemble approach for NER using multiple deep masked language models.…”

Section: Statistical Clinical Ner Methods Have Been Widely Usedmentioning

confidence: 99%

See 2 more Smart Citations

Privacy-preserving mimic models for clinical named entity recognition in French

Bannour

Wajsbürt

Rance

et al. 2022

Journal of Biomedical Informatics

View full text Add to dashboard Cite

Section: Statistical Clinical Ner Methods Have Been Widely Usedmentioning

confidence: 99%

Section: Statistical Clinical Ner Methods Have Been Widely Usedmentioning

confidence: 99%

See 1 more Smart Citation

Privacy-preserving mimic models for clinical named entity recognition in French

Bannour

Wajsbürt

Rance

et al. 2022

Journal of Biomedical Informatics

View full text Add to dashboard Cite

Information Retrieval in an Infodemic: The Case of COVID-19 Publications

Teodoro¹,

Ferdowsi²,

Borissov³

et al. 2021

J Med Internet Res

Self Cite

View full text Add to dashboard Cite

Background The COVID-19 global health crisis has led to an exponential surge in published scientific literature. In an attempt to tackle the pandemic, extremely large COVID-19–related corpora are being created, sometimes with inaccurate information, which is no longer at scale of human analyses. Objective In the context of searching for scientific evidence in the deluge of COVID-19–related literature, we present an information retrieval methodology for effective identification of relevant sources to answer biomedical queries posed using natural language. Methods Our multistage retrieval methodology combines probabilistic weighting models and reranking algorithms based on deep neural architectures to boost the ranking of relevant documents. Similarity of COVID-19 queries is compared to documents, and a series of postprocessing methods is applied to the initial ranking list to improve the match between the query and the biomedical information source and boost the position of relevant documents. Results The methodology was evaluated in the context of the TREC-COVID challenge, achieving competitive results with the top-ranking teams participating in the competition. Particularly, the combination of bag-of-words and deep neural language models significantly outperformed an Okapi Best Match 25–based baseline, retrieving on average, 83% of relevant documents in the top 20. Conclusions These results indicate that multistage retrieval supported by deep learning could enhance identification of literature for COVID-19–related questions posed using natural language.

show abstract

Information Retrieval in an Infodemic: The Case of COVID-19 Publications (Preprint)

Teodoro¹,

Ferdowsi²,

Borissov³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

BACKGROUND The COVID-19 global health crisis has led to an exponential surge in the published scientific literature. In the attempt to tackle the pandemic, extremely large COVID-19-related corpora are being created, sometimes with inaccurate information, which is no longer at scale of human analyses. OBJECTIVE In the context of searching for scientific evidence in the deluge of COVID-19-related literature, we present an information retrieval methodology for effective identification of relevant sources to answer biomedical queries posed using natural language. METHODS Our multi-stage retrieval methodology combines probabilistic weighting models and re-ranking algorithms based on deep neural architectures to boost the ranking of relevant documents. Similarity of COVID-19 queries are compared to documents and a series of post-processing methods are applied to the initial ranking list to improve the match between the query and the biomedical information source and boost the position of relevant documents. RESULTS The methodology was evaluated in the context of the TREC-COVID challenge, achieving competitive results with the top-ranking teams participating in the competition. Particularly, the combination of bag-of-words and deep neural language models significantly outperformed a BM25-based baseline, retrieving on average 83% of relevant documents in the top 20. CONCLUSIONS These results indicate that multi-stage retrieval supported by deep learning could enhance identification of literature for COVID-19-related questions posed using natural language.

show abstract

Ensemble of deep masked language models for effective named entity recognition in multi-domain corpora

Cited by 3 publications

References 43 publications

Privacy-preserving mimic models for clinical named entity recognition in French

Privacy-preserving mimic models for clinical named entity recognition in French

Information Retrieval in an Infodemic: The Case of COVID-19 Publications

Information Retrieval in an Infodemic: The Case of COVID-19 Publications (Preprint)

Contact Info

Product

Resources

About