Proceedings of the 18th BioNLP Workshop and Shared Task 2019
DOI: 10.18653/v1/w19-5008
|View full text |Cite
|
Sign up to set email alerts
|

MoNERo: a Biomedical Gold Standard Corpus for the Romanian Language

Abstract: In an era when large amounts of data are generated daily in various fields, the biomedical field among others, linguistic resources can be exploited for various tasks of Natural Language Processing. Moreover, increasing number of biomedical documents are available in languages other than English. To be able to extract information from natural language free text resources, methods and tools are needed for a variety of languages. This paper presents the creation of the MoNERo corpus, a gold standard biomedical c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
4

Relationship

2
6

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 23 publications
0
7
0
Order By: Relevance
“…We hope that recently released corpora, e.g. the BioRo corpus for Romanian (Mitrofan and Tufis, 2018), can boost performance of MT systems for these languages. However, more parallel corpora are certainly necessary not only for those languages that scored worst in this challenge, but also for the many other languages that we did not evaluate here.…”
Section: Differences Across Languagesmentioning
confidence: 99%
See 1 more Smart Citation
“…We hope that recently released corpora, e.g. the BioRo corpus for Romanian (Mitrofan and Tufis, 2018), can boost performance of MT systems for these languages. However, more parallel corpora are certainly necessary not only for those languages that scored worst in this challenge, but also for the many other languages that we did not evaluate here.…”
Section: Differences Across Languagesmentioning
confidence: 99%
“…There is active development of parallel corpora in this domain (see the recent survey in (Névéol et al, 2018)). In this year alone, three new corpora have been published in a single conference: a compilation of full texts from the Scielo database for English, Portuguese, and Spanish , medical documents and glossaries for Spanish/English (Villegas et al, 2018) and a biomedical corpus for Romanian (Mitrofan and Tufis, 2018). However, in spite of the growing number of parallel corpora and the many open source tools for MT (e.g., Moses (Koehn et al, 2007), OpenNMT (Klein et al, 2017) and Marian (Junczys-Dowmunt et al, 2018)), there is still no ready-to-use tool for automatic translation of biomedical publications for any language pair.…”
Section: Introductionmentioning
confidence: 99%
“…medical Scientific articles/books in the field of medicine (e.g. cardiology, diabetes, endocrinology for Romanian-SiMoNERo by Mitrofan et al, 2019). It is subsumed by academic for some treebanks (e.g.…”
Section: Available Metadatamentioning
confidence: 99%
“…For the Romanian language, the MoNERo corpus [8] was created as a medical gold standard corpus with morphological and named entities annotations. e corpus consists of 4,989 sentences from articles related to cardiology, diabetes, and endocrinology and was annotated with four NE types: anatomy, chemicals and drugs, disorders, and procedures.…”
Section: Related Workmentioning
confidence: 99%