Advantages of Using a Spell Checker in Text Mining Pre-Processes

Quillo-Espino, Jhonathan; Romero-González, Rosa María; Lara-Guevara, Alberto

doi:10.4236/jcc.2018.611004

Cited by 8 publications

(4 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Processing and cleaning text data for semi-automated classification can require varying amounts of efforts and techniques, however, a set of typically used techniques has already been established: this set includes spellchecking (Quillo-Espino et al, 2018 ), lowercasing (Foster et al, 2020 ), stemming (Jivani, 2011 ; Bao et al, 2014 ; Singh and Gupta, 2017 ), lemmatization (Bao et al, 2014 ; Banks et al, 2018 ; Symeonidis et al, 2018 ; Foster et al, 2020 ), stopword removal (Foster et al, 2020 ), and different ways of text enrichment/adding of linguistic features (Foster et al, 2020 ). We will systematically review these options below and justify our choices (for an overview, see Table 2 ).…”

Section: Survey Motivation In the Gesis Panelmentioning

confidence: 99%

The semi-automatic classification of an open-ended question on panel survey motivation and its application in attrition analysis

Haensch¹,

Weiß²,

Steins³

et al. 2022

Front. Big Data

View full text Add to dashboard Cite

In this study, we demonstrate how supervised learning can extract interpretable survey motivation measurements from a large number of responses to an open-ended question. We manually coded a subsample of 5,000 responses to an open-ended question on survey motivation from the GESIS Panel (25,000 responses in total); we utilized supervised machine learning to classify the remaining responses. We can demonstrate that the responses on survey motivation in the GESIS Panel are particularly well suited for automated classification, since they are mostly one-dimensional. The evaluation of the test set also indicates very good overall performance. We present the pre-processing steps and methods we used for our data, and by discussing other popular options that might be more suitable in other cases, we also generalize beyond our use case. We also discuss various minor problems, such as a necessary spelling correction. Finally, we can showcase the analytic potential of the resulting categorization of panelists' motivation through an event history analysis of panel dropout. The analytical results allow a close look at respondents' motivations: they span a wide range, from the urge to help to interest in questions or the incentive and the wish to influence those in power through their participation. We conclude our paper by discussing the re-usability of the hand-coded responses for other surveys, including similar open questions to the GESIS Panel question.

show abstract

Section: Survey Motivation In the Gesis Panelmentioning

confidence: 99%

The semi-automatic classification of an open-ended question on panel survey motivation and its application in attrition analysis

Haensch¹,

Weiß²,

Steins³

et al. 2022

Front. Big Data

View full text Add to dashboard Cite

show abstract

“…Processing and cleaning text data for semi-automated classification can require varying amounts of efforts and techniques, however, a set of typically used techniques has already been established: this set includes spellchecking (Quillo-Espino et al, 2018), lowercasing (Foster et al, 2020), stemming (Jivani, 2011;Bao et al, 2014;Singh and Gupta, 2017), lemmatization (Bao et al, 2014;Banks et al, 2018;Symeonidis et al, 2018;Foster et al, 2020), stopword removal (Foster et al, 2020), and different ways of text enrichment/adding of linguistic features (Foster et al, 2020). We will systematically review these options below and justify our choices (for an overview, see Table 2).…”

Section: Pre-processingmentioning

confidence: 99%

The semi-automatic classification of an open-ended question on panel survey motivation and its application in attrition analysis

Haensch¹,

Weiß²,

Steins³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this study, we demonstrate how supervised learning can extract interpretable survey motivation measurements from a large number of responses to an open-ended question. We manually coded a subsample of 5,000 responses to an open-ended question on survey motivation from the GESIS Panel (25,000 responses in total); we utilized supervised machine learning to classify the remaining responses. We can demonstrate that the responses on survey motivation in the GESIS Panel are particularly well suited for automated classification, since they are mostly one-dimensional. The evaluation of the test set also indicates excellent overall performance. We present the pre-processing steps and methods we used for our data, and by discussing other popular options that might be more suitable in other cases, we also generalize beyond our use case. We also discuss various minor problems, such as a necessary spelling correction. Finally, we can showcase the analytic potential of the resulting categorization of panelists' motivation through an event history analysis of panel dropout. The analytical results allow a close look at respondents' motivations: they span a wide range, from the urge to help to interest in questions or the incentive and the wish to influence those in power through their participation. We conclude our paper by discussing the re-usability of the hand-coded responses for other surveys, including similar open questions to the GESIS Panel question.

show abstract

“…It enhances the customer relationship. The security and privacy lacking in data are the disadvantages [5]. Moreover, Machine learning is a part of human life.…”

Section: Introductionmentioning

confidence: 99%

A Novel Support Vector Machine based Improved Aquila Optimizer-based Text Mining Mechanism for the Healthcare Applications

Sultanuddin S. J

2024

jes

View full text Add to dashboard Cite

Social media acts as one of the biggest contributions in every field. In healthcare applications it helps to estimate the quality of the services provided by different hospitals and doctors. Using the text mining technique, the services are analyzed. Several text mining techniques were performed in recent times. However, the effectiveness of text mining in the healthcare field is still a complicated task. Hence, we propose a novel Support Vector Machine (SVM) based Improved Aquilla Optimizer (IAO) algorithm to enhance the text mining from the reviews in the social media. Using this patient can easily evaluate the quality and services of particular clinics and doctors. The work includes the preprocessing of the dataset collected and then discriminative least square regression (DLSR) for the extraction of features from the preprocessed data. Experimental analysis is conducted to analyze the performance of the proposed work. The results are compared with state-of-art works with different performance metrics. Thus, our proposed work can be used to mine the text for the healthcare applications.

show abstract

Advantages of Using a Spell Checker in Text Mining Pre-Processes

Cited by 8 publications

References 13 publications

The semi-automatic classification of an open-ended question on panel survey motivation and its application in attrition analysis

The semi-automatic classification of an open-ended question on panel survey motivation and its application in attrition analysis

The semi-automatic classification of an open-ended question on panel survey motivation and its application in attrition analysis

A Novel Support Vector Machine based Improved Aquila Optimizer-based Text Mining Mechanism for the Healthcare Applications

Contact Info

Product

Resources

About