2022
DOI: 10.1145/3498324
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Extraction of Nested Entities in Clinical Referrals in Spanish

Abstract: Here we describe a new clinical corpus rich in nested entities and a series of neural models to identify them. The corpus comprises de-identified referrals from the waiting list in Chilean public hospitals. A subset of 5,000 referrals (58.6% medical and 41.4% dental) was manually annotated with 10 types of entities, six attributes, and pairs of relations with clinical relevance. In total, there are 110,771 annotated tokens. A trained medical doctor or dentist annotated these referrals, and then, together with … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

4
4

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 52 publications
0
10
0
1
Order By: Relevance
“…Our group has experience using static word embeddings for patient classification deployed in a hospital (Villena et al, 2021b ) and using stacked embeddings that combine both static and contextualized embeddings for named entity recognition (Báez et al, 2020 ; Báez et al, 2022 ) and automatic coding (Villena et al, 2021a ). Evaluating the automatic translation of clinical sentences has pointed us the need for creating reliable intrinsic tests created from scratch for the Spanish language, which can be valuable for both static and contextual word embeddings.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Our group has experience using static word embeddings for patient classification deployed in a hospital (Villena et al, 2021b ) and using stacked embeddings that combine both static and contextualized embeddings for named entity recognition (Báez et al, 2020 ; Báez et al, 2022 ) and automatic coding (Villena et al, 2021a ). Evaluating the automatic translation of clinical sentences has pointed us the need for creating reliable intrinsic tests created from scratch for the Spanish language, which can be valuable for both static and contextual word embeddings.…”
Section: Discussionmentioning
confidence: 99%
“…For the computation of word embeddings a training corpus is needed. For this reason, we extracted data from three sources: (1) clinical narratives from The Chilean Waiting List Corpus, which is a collection of diagnostic suspicions from the waiting list in Chilean public hospitals (Báez et al, 2020 ; Báez et al, 2022 ; Villena et al, 2021b ), (2) a medical journal corpus extracted from the SciELO library, which is a collection of articles from several medical journals in Spanish (Villena et al, 2020 ) and (3) a corpus constructed from the Unified Medical Language System (UMLS) (Bodenreider, 2004 ) term graph. We computed embeddings using Word2vec and fastText algorithms and validated them using classic intrinsic evaluation tests adapted to Spanish, such as word pair similarity and semantic textual similarity.…”
Section: Introductionmentioning
confidence: 99%
“…An example of an annotation is shown in Figure 2. The result of the annotation process was a file in standoff format [4] , thus including the annotation of tokens with more than one label, which is known as nested entities [21]. In particular, this work was focused on three entities: Diseases, Body Parts, and Medications.…”
Section: Manual Annotation For Nermentioning
confidence: 99%
“…Although the general purpose of this dataset was to be a new resource for named entity recognition, it has also been used to obtain static word embeddings from the clinical domain (Villena et al, 2021b). These representations have boosted the model's performance in several clinical NLP tasks such as tumor encoding (Villena et al, 2021a) and named entity recognition (Báez et al, 2022).…”
Section: Clinical Flairmentioning
confidence: 99%