2018
DOI: 10.4236/jcc.2018.611004
|View full text |Cite
|
Sign up to set email alerts
|

Advantages of Using a Spell Checker in Text Mining Pre-Processes

Abstract: The aim of this work was the behavior analysis when a spell checker was integrated as an extra pre-process during the first stage of the test mining. Different models were analyzed, choosing the most complete one considering the pre-processes as the initial part of the text mining process. Algorithms for the Spanish language were developed and adapted, as well as for the methodology testing through the analysis of 2363 words. A capable notation for removing special and unwanted characters was created. Executio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 13 publications
0
4
0
Order By: Relevance
“…Processing and cleaning text data for semi-automated classification can require varying amounts of efforts and techniques, however, a set of typically used techniques has already been established: this set includes spellchecking (Quillo-Espino et al, 2018 ), lowercasing (Foster et al, 2020 ), stemming (Jivani, 2011 ; Bao et al, 2014 ; Singh and Gupta, 2017 ), lemmatization (Bao et al, 2014 ; Banks et al, 2018 ; Symeonidis et al, 2018 ; Foster et al, 2020 ), stopword removal (Foster et al, 2020 ), and different ways of text enrichment/adding of linguistic features (Foster et al, 2020 ). We will systematically review these options below and justify our choices (for an overview, see Table 2 ).…”
Section: Survey Motivation In the Gesis Panelmentioning
confidence: 99%
“…Processing and cleaning text data for semi-automated classification can require varying amounts of efforts and techniques, however, a set of typically used techniques has already been established: this set includes spellchecking (Quillo-Espino et al, 2018 ), lowercasing (Foster et al, 2020 ), stemming (Jivani, 2011 ; Bao et al, 2014 ; Singh and Gupta, 2017 ), lemmatization (Bao et al, 2014 ; Banks et al, 2018 ; Symeonidis et al, 2018 ; Foster et al, 2020 ), stopword removal (Foster et al, 2020 ), and different ways of text enrichment/adding of linguistic features (Foster et al, 2020 ). We will systematically review these options below and justify our choices (for an overview, see Table 2 ).…”
Section: Survey Motivation In the Gesis Panelmentioning
confidence: 99%
“…Processing and cleaning text data for semi-automated classification can require varying amounts of efforts and techniques, however, a set of typically used techniques has already been established: this set includes spellchecking (Quillo-Espino et al, 2018), lowercasing (Foster et al, 2020), stemming (Jivani, 2011;Bao et al, 2014;Singh and Gupta, 2017), lemmatization (Bao et al, 2014;Banks et al, 2018;Symeonidis et al, 2018;Foster et al, 2020), stopword removal (Foster et al, 2020), and different ways of text enrichment/adding of linguistic features (Foster et al, 2020). We will systematically review these options below and justify our choices (for an overview, see Table 2).…”
Section: Pre-processingmentioning
confidence: 99%
“…It enhances the customer relationship. The security and privacy lacking in data are the disadvantages [5]. Moreover, Machine learning is a part of human life.…”
Section: Introductionmentioning
confidence: 99%