A comparison of approaches for imbalanced classification problems in the context of retrieving relevant documents for an analysis

Wankmüller, Sandra

doi:10.1007/s42001-022-00191-7

J Comput Soc Sc

2022

DOI: 10.1007/s42001-022-00191-7

|View full text |Cite

A comparison of approaches for imbalanced classification problems in the context of retrieving relevant documents for an analysis

Sandra Wankmüller

Abstract: One of the first steps in many text-based social science studies is to retrieve documents that are relevant for an analysis from large corpora of otherwise irrelevant documents. The conventional approach in social science to address this retrieval task is to apply a set of keywords and to consider those documents to be relevant that contain at least one of the keywords. But the application of incomplete keyword lists has a high risk of drawing biased inferences. More complex and costly methods such as query ex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2022

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 91 publications

(115 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

Introduction to Neural Transfer Learning With Transformers for Social Science Text Analysis

Wankmüller

2022

Sociological Methods & Research

View full text Add to dashboard Cite

Transformer-based models for transfer learning have the potential to achieve high prediction accuracies on text-based supervised learning tasks with relatively few training data instances. These models are thus likely to benefit social scientists that seek to have as accurate as possible text-based measures, but only have limited resources for annotating training data. To enable social scientists to leverage these potential benefits for their research, this article explains how these methods work, why they might be advantageous, and what their limitations are. Additionally, three Transformer-based models for transfer learning, BERT, RoBERTa, and the Longformer, are compared to conventional machine learning algorithms on three applications. Across all evaluated tasks, textual styles, and training data set sizes, the conventional models are consistently outperformed by transfer learning with Transformers, thereby demonstrating the benefits these models can bring to text-based social science research.

show abstract

Introduction to Neural Transfer Learning With Transformers for Social Science Text Analysis

Wankmüller

2022

Sociological Methods & Research

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

A comparison of approaches for imbalanced classification problems in the context of retrieving relevant documents for an analysis

Cited by 1 publication

References 91 publications

Introduction to Neural Transfer Learning With Transformers for Social Science Text Analysis

Introduction to Neural Transfer Learning With Transformers for Social Science Text Analysis

Contact Info

Product

Resources

About