2020
DOI: 10.21203/rs.2.22697/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives

Abstract: Background: In the Big Data era there is an increasing need to fully exploit and analyse the huge quantity of information available about health. Natural Language Processing (NLP) technologies can contribute to extract relevant information from unstructured data contained in Electronic Health Records (EHR) such as clinical notes, patient’s discharge summaries and radiology reports among others. Extracted information could help in health-related decision making processes. Named entity recognition (NER) devoted … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 26 publications
0
4
0
Order By: Relevance
“…Nowadays, there is a strong development of contextualized word embeddings that assign dynamic representations to words based on their contexts, achieving state-of-the-art performance in multiple tasks. For the clinical domain in Spanish, relevant works include (Akhtyamova et al, 2020;Carrino et al, 2022;Rojas et al, 2022). These contextualized word embeddings are challenging to compute and deploy in production environments due to their demanding infrastructure needs.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Nowadays, there is a strong development of contextualized word embeddings that assign dynamic representations to words based on their contexts, achieving state-of-the-art performance in multiple tasks. For the clinical domain in Spanish, relevant works include (Akhtyamova et al, 2020;Carrino et al, 2022;Rojas et al, 2022). These contextualized word embeddings are challenging to compute and deploy in production environments due to their demanding infrastructure needs.…”
Section: Discussionmentioning
confidence: 99%
“…In sum, we have worked with a corpus of 235 million tokens, which is a good number compared to, for example, the 86 million tokens collected by Akhtyamova et al (2020). Nevertheless, our corpus is still way below the 13.5 billion tokes used to calculate BERT for the clinical domain in English (Lee et al, 2020).…”
Section: Medical Journals Corpusmentioning
confidence: 99%
See 1 more Smart Citation
“…It allows capturing the similarity between individual word vectors, thus providing information on the underlying word meanings [38]. Several pretrained word embedding models have been developed, such as word2vec, global vectors for word representation (GloVe) and FastText [39].…”
Section: Text Miningmentioning
confidence: 99%