2021
DOI: 10.3233/shti210125
|View full text |Cite
|
Sign up to set email alerts
|

The Classification of Short Scientific Texts Using Pretrained BERT Model

Abstract: Automated text classification is a natural language processing (NLP) technology that could significantly facilitate scientific literature selection. A specific topical dataset of 630 article abstracts was obtained from the PubMed database. We proposed 27 parametrized options of PubMedBERT model and 4 ensemble models to solve a binary classification task on that dataset. Three hundred tests with resamples were performed in each classification approach. The best PubMedBERT model demonstrated F1-score = 0.857 whi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 7 publications
0
6
0
Order By: Relevance
“…Text classification can streamline the organization of large literature volumes by assigning predefined categories or labels to documents. This approach has been effectively applied to enhance reference prioritization in systematic reviews [17,18,19,20,21]. Text classification has also been used to better understand the structure of papers, for example by automatically predicting sections and headers in Electronic Health Records [22].…”
Section: Nlp Tasksmentioning
confidence: 99%
“…Text classification can streamline the organization of large literature volumes by assigning predefined categories or labels to documents. This approach has been effectively applied to enhance reference prioritization in systematic reviews [17,18,19,20,21]. Text classification has also been used to better understand the structure of papers, for example by automatically predicting sections and headers in Electronic Health Records [22].…”
Section: Nlp Tasksmentioning
confidence: 99%
“…The development of these methods is supported by the relevance language that models such as BERT [22] and GPT [23] are gaining. These pretrained models are getting more involved in natural language tasks like text classification [24], text summarization [25], [26], and text retrieval [27]. Embedding-based approach in extracting keywords could be seen in methods such as SIFRank [12], KBIR, and KeyBART [13] which involve embeddings from pre-trained language models, such as ELMo [28], RoBERTa [29], and BART [30] respectively, in determining the key terms to a document.…”
Section: B Automatic Keyword Extraction (Ake)mentioning
confidence: 99%
“…Text mining plays a fundamental role in research [47,48] and medicine [49][50][51][52][53][54], and has important applications in several other fields, such as psychiatry [55], risk management [56], financial domains [57], finance [58], service management [59], social networks [60], social media [61], education and training [62], policy-making [63], and agriculture [64], among others.…”
Section: Text Miningmentioning
confidence: 99%