2020
DOI: 10.14201/adcaij2020924968
|View full text |Cite
|
Sign up to set email alerts
|

Influence of Pre-Processing Strategies on the Performance of ML Classifiers Exploiting TF-IDF and BOW Features

Abstract: Data analytics and its associated applications have recently become impor-tant fields of study. The subject of concern for researchers now-a-days is a massive amount of data produced every minute and second as people con-stantly sharing thoughts, opinions about things that are associated with them. Social media info, however, is still unstructured, disseminated and hard to handle and need to be developed a strong foundation so that they can be utilized as valuable information on a particular topic. Processing … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 43 publications
(10 citation statements)
references
References 35 publications
0
9
0
1
Order By: Relevance
“…TF-IDF not only vectorizes the text data but also quantifies the features among the whole corpus [ 45 ]. TF-IDF is obtained by taking a product of two metrics including the number of times a word has appeared in a document and the IDF (inverse document frequency) of a word in a collection of documents [ 46 ]. It assigns weights with respect to the importance of a term in a given corpus.…”
Section: Methodsmentioning
confidence: 99%
“…TF-IDF not only vectorizes the text data but also quantifies the features among the whole corpus [ 45 ]. TF-IDF is obtained by taking a product of two metrics including the number of times a word has appeared in a document and the IDF (inverse document frequency) of a word in a collection of documents [ 46 ]. It assigns weights with respect to the importance of a term in a given corpus.…”
Section: Methodsmentioning
confidence: 99%
“…This method is called One-Hot, that is, one-hot coding [ 18 ]. You can also adopt the strategy of TF-IDF (term frequency, inverse document frequency) [ 19 , 20 ], in which TF (Term frequency) is called word frequency, and its importance is shown in …”
Section: Relevant Technical Theory Of Intelligent Grading Of English ...mentioning
confidence: 99%
“…The text is converted into a bag of words in BOW (11) . With size m x n, the feature matrix is created.…”
Section: Bow Featuresmentioning
confidence: 99%