Towards the Construction of a Gold Standard Biomedical Corpus for the Romanian Language

Mitrofan, Maria; Mititelu, Verginica Barbu; Mitrofan, Grigorina

doi:10.3390/data3040053

Cited by 4 publications

(4 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One effort created a multilingual corpus (German and Spanish) of clinical text by scraping biomedical publications in those languages for clinical case reports [ 55 ]. Another effort scraped journal articles, blog posts, and books for biomedical text in Romanian topically related to three medical specialties -- cardiology, diabetes, and endocrinology -- and also added layers of linguistic annotation to facilitate model training [ 56 ].…”

Section: Resultsmentioning

confidence: 99%

A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records

et al. 2021

View full text Add to dashboard Cite

Summary Objectives: We survey recent work in biomedical NLP on building more adaptable or generalizable models, with a focus on work dealing with electronic health record (EHR) texts, to better understand recent trends in this area and identify opportunities for future research. Methods: We searched PubMed, the Institute of Electrical and Electronics Engineers (IEEE), the Association for Computational Linguistics (ACL) anthology, the Association for the Advancement of Artificial Intelligence (AAAI) proceedings, and Google Scholar for the years 2018-2020. We reviewed abstracts to identify the most relevant and impactful work, and manually extracted data points from each of these papers to characterize the types of methods and tasks that were studied, in which clinical domains, and current state-of-the-art results. Results: The ubiquity of pre-trained transformers in clinical NLP research has contributed to an increase in domain adaptation and generalization-focused work that uses these models as the key component. Most recently, work has started to train biomedical transformers and to extend the fine-tuning process with additional domain adaptation techniques. We also highlight recent research in cross-lingual adaptation, as a special case of adaptation. Conclusions: While pre-trained transformer models have led to some large performance improvements, general domain pre-training does not always transfer adequately to the clinical domain due to its highly specialized language. There is also much work to be done in showing that the gains obtained by pre-trained transformers are beneficial in real world use cases. The amount of work in domain adaptation and transfer learning is limited by dataset availability and creating datasets for new domains is challenging. The growing body of research in languages other than English is encouraging, and more collaboration between researchers across the language divide would likely accelerate progress in non-English clinical NLP.

show abstract

Section: Resultsmentioning

confidence: 99%

A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records

et al. 2021

View full text Add to dashboard Cite

show abstract

“…The aim of the Special Issue "Curative Power of Medical Data" of the Data Journal was to develop a community of researchers involved in biomedical research. This Special Issue contains four surveys, which include a wide range of topics, from author confidence in biomedical articles [18]; generating tests from medical references [19]; constructing a Gold standard biomedical corpus for the Romanian language [20]; up to the visualization of biomedical data among the Chinese elderly [21].…”

Section: Discussionmentioning

confidence: 99%

“…Besides the development of tools, this Special Issue also introduces a resource, i.e., the first morphologically and terminologically annotated biomedical corpus of the Romanian language [20]. With almost 14,000 tokens distributed in three medical subdomains (cardiology, diabetes and endocrinology), the corpus contains manually validated parts of speech and named entity annotations, useful for training specific biomedical applications.…”

Section: Discussionmentioning

confidence: 99%

Special Issue on the Curative Power of Medical Data

Gîfu

Trandabăț

Cohen

et al. 2019

Data

View full text Add to dashboard Cite

With the massive amounts of medical data made available online, language technologies have proven to be indispensable in processing biomedical and molecular biology literature, health data or patient records. With huge amount of reports, evaluating their impact has long ceased to be a trivial task. Linking the contents of these documents to each other, as well as to specialized ontologies, could enable access to and the discovery of structured clinical information and could foster a major leap in natural language processing and in health research. The aim of this Special Issue, “Curative Power of Medical Data” in Data, is to gather innovative approaches for the exploitation of biomedical data using semantic web technologies and linked data by developing a community involvement in biomedical research. This Special Issue contains four surveys, which include a wide range of topics, from the analysis of biomedical articles writing style, to automatically generating tests from medical references, constructing a Gold standard biomedical corpus or the visualization of biomedical data.

show abstract

“…This conclusion means that the rules provided in the guidelines may create ambiguity and need to be modified for that label. GAA is a measure that compares the overlap between the documents labelled by the annotators and the gold standard corpus (GSC) [34], which is a trustworthy annotated corpus.…”

Section: Manual Labelling Of the Validation Corpus (A12)mentioning

confidence: 99%

DEEP, a methodology for entity extraction using organizational patterns: Application to job offers

Ramdani¹,

Brun²,

Bonjour³

et al. 2022

Knowledge-Based Systems

View full text Add to dashboard Cite

Towards the Construction of a Gold Standard Biomedical Corpus for the Romanian Language

Cited by 4 publications

References 15 publications

A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records

A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records

Special Issue on the Curative Power of Medical Data

DEEP, a methodology for entity extraction using organizational patterns: Application to job offers

Contact Info

Product

Resources

About