Fine-grained classification of social science journal articles using
                    textual data: A comparison of supervised machine learning
                    approaches

Eykens, Joshua; Guns, Raf; Engels, Tim

doi:10.1162/qss_a_00106

Cited by 15 publications

(22 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Scientometric studies usually focus on authorship or measurement of journal or professional association contributions. However, they may also examine terms that appear in titles, abstracts, full texts of book chapters and journal articles, or keywords assigned by editors to published articles or publishing houses [9,[21][22][23]. González-Alcaide et al [24] used scientometric analysis to identify the main research interests and directions on Chagas cardiomyopathy in the MEDLINE database.…”

Section: Literature Reviewmentioning

confidence: 99%

“…At the same time, it helps academic institutions and scientific literature management platforms analyze the development direction of disciplines [6], facilitates the exploration of knowledge production and dissemination, and accelerates the rapid development of scientific research [7,8]. Acknowledging the advantages of scientometric analysis, it has been widely used to evaluate leading scientific researchers or publications [9], examine the structure of a scientific field's network [10,11], reveal emerging issues [12], and help researchers study the development of research fields and disciplines by categorizing documents along multiple dimensions [4].…”

Section: Introductionmentioning

confidence: 99%

“…Typically, scientometric studies focus on broad classification of published articles based on primary or secondary subjects or disciplines. Currently, scientometric studies classify publications using generic classification systems, such as the Web of Science (WoS) subject categories and the Field of Science and Technology Classification (FOS) [9]. In their current form, these systems are too broad to adequately reflect the more complex, fine-grained cognitive reality; therefore, their scope is limited and they only indicate broad scientific domains or general disciplines.…”

Section: Introductionmentioning

confidence: 99%

“…A recent study by Wahid et al [11] found that focused research communities can be distinguished based on publication associations, and publication practices and patterns may vary within these communities. However, fine-grained classification is challenging for researchers because it is not clear to what extent authors in particular fields collaborate to disseminate new findings and knowledge [9,14,15]. Such finer classification usually involves two aspects.…”

Section: Introductionmentioning

confidence: 99%

“…Commonly used machine learning algorithms include support vector machines (SVM), Naive Bayes classifiers, and the K-nearest neighbor model (KNN) [16]. However, with the abundance and diversity of scientific research and the exponential increase in scientific output, classification methods based on general scientometric information and traditional machine learning methods have shown significant shortcomings in terms of coarse classification and insufficient accuracy [1,9,17]. Moreover, the data elements used by existing scientific literature classification systems are primarily derived from explicit scientometric information such as titles, abstracts, and keywords.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Scientometric Analysis and Classification of Research Using Convolutional Neural Networks: A Case Study in Data Science and Analytics

et al. 2022

View full text Add to dashboard Cite

With the increasing development of published literature, classification methods based on bibliometric information and traditional machine learning approaches encounter performance challenges related to overly coarse classifications and low accuracy. This study presents a deep learning approach for scientometric analysis and classification of scientific literature based on convolutional neural networks (CNN). Three dimensions, namely publication features, author features, and content features, were divided into explicit and implicit features to form a set of scientometric terms through explicit feature extraction and implicit feature mapping. The weighted scientometric term vectors are fitted into a CNN model to achieve dual-label classification of literature based on research content and methods. The effectiveness of the proposed model is demonstrated using an application example from the data science and analytics literature. The empirical results show that the scientometric classification model proposed in this study performs better than comparable machine learning classification methods in terms of precision, recognition, and F1-score. It also exhibits higher accuracy than deep learning classification based solely on explicit and dominant features. This study provides a methodological guide for fine-grained classification of scientific literature and a thorough investigation of its practice.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Scientometric Analysis and Classification of Research Using Convolutional Neural Networks: A Case Study in Data Science and Analytics

et al. 2022

View full text Add to dashboard Cite

show abstract

Classifying papers into subfields using Abstracts, Titles, Keywords and KeyWords Plus through pattern detection and optimization procedures: An application in Physics

Pech

Delgado

Sorella

2022

Asso for Info Science & Tech

View full text Add to dashboard Cite

Classifying papers according to the fields of knowledge is critical to clearly understand the dynamics of scientific (sub)fields, their leading questions, and trends. Most studies rely on journal categories defined by popular databases such as WoS or Scopus, but some experts find that those categories may not correctly map the existing subfields nor identify the subfield of a specific article. This study addresses the classification problem using data from each paper (Abstract, Title, Keywords, and the KeyWords Plus) and the help of experts to identify the existing subfields and journals exclusive of each subfield. These “exclusive journals” are critical to obtain, through a pattern detection procedure that uses machine learning techniques (from software NVivo), a list of the frequent terms that are specific to each subfield. With that list of terms and with the help of optimization procedures, we can identify to which subfield each paper most likely belongs. This study can contribute to support scientific policy‐makers, funding, and research institutions—via more accurate academic performance evaluations—, to support editors in their tasks to redefine the scopes of journals, and to support popular databases in their processes of refining categories.

show abstract

FoRC@NSLP2024: Overview and Insights from the Field of Research Classification Shared Task

Abu Ahmad,

Borisova,

Rehm

2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

This article provides an overview of the Field of Research Classification (FoRC) shared task conducted as part of the Natural Scientific Language Processing Workshop (NSLP) 2024. The FoRC shared task encompassed two subtasks: the first was a single-label multi-class classification of scholarly papers across a taxonomy of 123 fields, while the second focused on fine-grained multi-label classification within computational linguistics, using a taxonomy of 170 (sub-)topics. The shared task received 13 submissions for the first subtask and two for the second, with teams surpassing baseline performance metrics in both subtasks. The winning team for subtask I employed a multi-modal approach integrating metadata, full-text, and images from publications, achieving a weighted F1 score of 0.75, while the winning team for the second subtask leveraged a weakly supervised X-transformer model enriched with automatically labelled data, achieving a micro F1 score of 0.56 and a macro F1 of 0.43.

show abstract

Fine-grained classification of social science journal articles using textual data: A comparison of supervised machine learning approaches

Cited by 15 publications

References 43 publications

Scientometric Analysis and Classification of Research Using Convolutional Neural Networks: A Case Study in Data Science and Analytics

Scientometric Analysis and Classification of Research Using Convolutional Neural Networks: A Case Study in Data Science and Analytics

Classifying papers into subfields using Abstracts, Titles, Keywords and KeyWords Plus through pattern detection and optimization procedures: An application in Physics

FoRC@NSLP2024: Overview and Insights from the Field of Research Classification Shared Task

Contact Info

Product

Resources

About