2020
DOI: 10.1007/s42600-020-00067-7
|View full text |Cite
|
Sign up to set email alerts
|

Defining a state-of-the-art POS-tagging environment for Brazilian Portuguese clinical texts

Abstract: Purpose Natural language processing techniques are essential for unlocking patients' data from electronic health records. An important NLP task is the ability to recognize morphosyntactic information from the texts, a process called part-of-speech (POS) tagging. Currently, neural network architectures are the state-of-the-art method, although there is a lack of studies exploiting this approach within Brazilian Portuguese clinical texts. The objective of this study is to define a state-of-the-art POS-tagging en… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 16 publications
0
7
0
Order By: Relevance
“…We achieved an F1 score and accuracy We also achieved superior results for the Mac-Morphos corpus using BERT-based models compared to models trained over other architectures reported in the literature, evidencing the impact of models based on the Transformer architecture, as BERT-based models, in NLP tasks. Regarding architectures: [9] used FLAIR; [17] We found that the lack of standardization of clinical texts directly impacted the POStagger-BERTimbau and POStagger-BioBERTpt models' results. Both models were trained with Mac-Morpho, a news texts corpora, and may not be robust enough to deal with some aspects particular to the clinical domain, such as extensive use of abbreviations and acronyms, uppercase terms, domain-specific vocabulary, flexible formatting, and atypical grammatical constructions [18].…”
Section: Resultsmentioning
confidence: 96%
See 4 more Smart Citations
“…We achieved an F1 score and accuracy We also achieved superior results for the Mac-Morphos corpus using BERT-based models compared to models trained over other architectures reported in the literature, evidencing the impact of models based on the Transformer architecture, as BERT-based models, in NLP tasks. Regarding architectures: [9] used FLAIR; [17] We found that the lack of standardization of clinical texts directly impacted the POStagger-BERTimbau and POStagger-BioBERTpt models' results. Both models were trained with Mac-Morpho, a news texts corpora, and may not be robust enough to deal with some aspects particular to the clinical domain, such as extensive use of abbreviations and acronyms, uppercase terms, domain-specific vocabulary, flexible formatting, and atypical grammatical constructions [18].…”
Section: Resultsmentioning
confidence: 96%
“…We randomly selected 50 sentences containing between 6 and 15 tokens, which were manually POS-annotated by a human linguist, referred to in this paper as human annotation. For comparison purposes, we also evaluated two clinical POS-tagger models, trained with Flair [9] and Spacy [10], referred to in this paper as Flair and Spacy. For evaluation, we separated words pertaining to distinct functions in contracted forms in Portuguese, such as the contraction of a preposition with an article (eg "NA" was separated into "EM + A").…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations