2022
DOI: 10.1016/j.jbi.2021.103961
|View full text |Cite
|
Sign up to set email alerts
|

The RareDis corpus: A corpus annotated with rare diseases, their signs and symptoms

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(20 citation statements)
references
References 34 publications
1
19
0
Order By: Relevance
“…Similar phenotypic features may be recorded in a heterogeneous manner by biomedical annotators using the same source documents. This inter- and intra-observer variability is a well-known phenomenon ( 17 , 18 ). However, the issue of clinical phenotype heterogeneity across disease models is less well studied.…”
Section: Resultsmentioning
confidence: 99%
“…Similar phenotypic features may be recorded in a heterogeneous manner by biomedical annotators using the same source documents. This inter- and intra-observer variability is a well-known phenomenon ( 17 , 18 ). However, the issue of clinical phenotype heterogeneity across disease models is less well studied.…”
Section: Resultsmentioning
confidence: 99%
“…Additionally, since the corpus might be unbalanced, as the number of tokens tagged "O" is much higher than other labels, Kappa would be high and overestimates IAA. On the other hand, ignoring the "O" label yields low Kappa scores (Hripcsak and Rothschild, 2005;Brandsen et al, 2020;Campillos-Llanos et al, 2021;Martínez-deMiguel et al, 2022). Instead, they recommended to use the F1score, as it might be more appropriate in reflecting the IAA in NER tasks.…”
Section: Inter-annotator Agreementmentioning
confidence: 99%
“…We use the RareDis corpus [41], which is a collection of texts from the Rare Disease database (NORD) [1] . These texts were manually annotated with four entity types (diseases, rare diseases, signs, and symptoms).…”
Section: Datasetmentioning
confidence: 99%
“…Table 1 shows the number of the entity types annotated, as well as the number of documents, sentences, and tokens in each split. A more detailed description of the RareDis corpus can be found in [41]. The corpus contains a total of 9,318 entities.…”
Section: Datasetmentioning
confidence: 99%