2018
DOI: 10.3390/data3040053
|View full text |Cite
|
Sign up to set email alerts
|

Towards the Construction of a Gold Standard Biomedical Corpus for the Romanian Language

Abstract: Gold standard corpora (GSCs) are essential for the supervised training and evaluation of systems that perform natural language processing (NLP) tasks. Currently, most of the resources used in biomedical NLP tasks are mainly in English. Little effort has been reported for other languages including Romanian and, thus, access to such language resources is poor. In this paper, we present the construction of the first morphologically and terminologically annotated biomedical corpus of the Romanian language (MoNERo)… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…One effort created a multilingual corpus (German and Spanish) of clinical text by scraping biomedical publications in those languages for clinical case reports [ 55 ]. Another effort scraped journal articles, blog posts, and books for biomedical text in Romanian topically related to three medical specialties -- cardiology, diabetes, and endocrinology -- and also added layers of linguistic annotation to facilitate model training [ 56 ].…”
Section: Resultsmentioning
confidence: 99%
“…One effort created a multilingual corpus (German and Spanish) of clinical text by scraping biomedical publications in those languages for clinical case reports [ 55 ]. Another effort scraped journal articles, blog posts, and books for biomedical text in Romanian topically related to three medical specialties -- cardiology, diabetes, and endocrinology -- and also added layers of linguistic annotation to facilitate model training [ 56 ].…”
Section: Resultsmentioning
confidence: 99%
“…The aim of the Special Issue "Curative Power of Medical Data" of the Data Journal was to develop a community of researchers involved in biomedical research. This Special Issue contains four surveys, which include a wide range of topics, from author confidence in biomedical articles [18]; generating tests from medical references [19]; constructing a Gold standard biomedical corpus for the Romanian language [20]; up to the visualization of biomedical data among the Chinese elderly [21].…”
Section: Discussionmentioning
confidence: 99%
“…Besides the development of tools, this Special Issue also introduces a resource, i.e., the first morphologically and terminologically annotated biomedical corpus of the Romanian language [20]. With almost 14,000 tokens distributed in three medical subdomains (cardiology, diabetes and endocrinology), the corpus contains manually validated parts of speech and named entity annotations, useful for training specific biomedical applications.…”
Section: Discussionmentioning
confidence: 99%
“…This conclusion means that the rules provided in the guidelines may create ambiguity and need to be modified for that label. GAA is a measure that compares the overlap between the documents labelled by the annotators and the gold standard corpus (GSC) [34], which is a trustworthy annotated corpus.…”
Section: Manual Labelling Of the Validation Corpus (A12)mentioning
confidence: 99%