Proceedings of the 3rd Clinical Natural Language Processing Workshop 2020
DOI: 10.18653/v1/2020.clinicalnlp-1.32
|View full text |Cite
|
Sign up to set email alerts
|

The Chilean Waiting List Corpus: a new resource for clinical Named Entity Recognition in Spanish

Abstract: In this work we describe the Waiting List Corpus consisting of de-identified referrals for several specialty consultations from the waiting list in Chilean public hospitals. A subset of 900 referrals was manually annotated with 9,029 entities, 385 attributes, and 284 pairs of relations with clinical relevance. A trained medical doctor annotated these referrals, and then together with other three researchers, consolidated each of the annotations. The annotated corpus has nested entities, with 32.2% of entities … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

4
5

Authors

Journals

citations
Cited by 22 publications
(19 citation statements)
references
References 29 publications
0
19
0
Order By: Relevance
“…Our group has experience using static word embeddings for patient classification deployed in a hospital (Villena et al, 2021b ) and using stacked embeddings that combine both static and contextualized embeddings for named entity recognition (Báez et al, 2020 ; Báez et al, 2022 ) and automatic coding (Villena et al, 2021a ). Evaluating the automatic translation of clinical sentences has pointed us the need for creating reliable intrinsic tests created from scratch for the Spanish language, which can be valuable for both static and contextual word embeddings.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Our group has experience using static word embeddings for patient classification deployed in a hospital (Villena et al, 2021b ) and using stacked embeddings that combine both static and contextualized embeddings for named entity recognition (Báez et al, 2020 ; Báez et al, 2022 ) and automatic coding (Villena et al, 2021a ). Evaluating the automatic translation of clinical sentences has pointed us the need for creating reliable intrinsic tests created from scratch for the Spanish language, which can be valuable for both static and contextual word embeddings.…”
Section: Discussionmentioning
confidence: 99%
“…For the computation of word embeddings a training corpus is needed. For this reason, we extracted data from three sources: (1) clinical narratives from The Chilean Waiting List Corpus, which is a collection of diagnostic suspicions from the waiting list in Chilean public hospitals (Báez et al, 2020 ; Báez et al, 2022 ; Villena et al, 2021b ), (2) a medical journal corpus extracted from the SciELO library, which is a collection of articles from several medical journals in Spanish (Villena et al, 2020 ) and (3) a corpus constructed from the Unified Medical Language System (UMLS) (Bodenreider, 2004 ) term graph. We computed embeddings using Word2vec and fastText algorithms and validated them using classic intrinsic evaluation tests adapted to Spanish, such as word pair similarity and semantic textual similarity.…”
Section: Introductionmentioning
confidence: 99%
“…These models trained on a large corpus obtained from the Spanish Wikipedia are freely available in the Flair framework (Akbik et al, 2019). To incorporate key information from the clinical context, we fine-tuned these models on the Chilean Waiting List corpus (Báez et al, 2020), which is a clinical corpus created from real diagnoses from the Chilean public healthcare system.…”
Section: Clinical Flairmentioning
confidence: 99%
“…These models trained on a large corpus obtained from the Spanish Wikipedia are freely available in the Flair framework (Akbik et al, 2019). To incorporate key information from the clinical context, we fine-tuned these models on the Chilean Waiting List corpus (Báez et al, 2020), which is a clinical corpus created from real diagnoses from the Chilean public healthcare system.…”
Section: Clinical Flairmentioning
confidence: 99%