2020
DOI: 10.26782/jmcms.spl.5/2020.01.00010
|View full text |Cite
|
Sign up to set email alerts
|

A Comparative Approach Oftext Mining: Classification, Clustering Andextraction Techniques

Abstract: The amount of text generated a day dramatically increases. Computers cannot easily process and perceive this enormous amount of mostly unstructured text. Therefore, to discover useful patterns, efficient and effective techniques and algorithms are required. Text mining is the process of extracting meaningful information from the text, which has received considerable attention in recent years. In this paper, we discuss several of the most basic tasks and techniques of text mining, including pre-processing, clas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…Most words only appear once in the texts that include them. Therefore, the term frequency‐inverse document frequency (TF‐IDF) 36 metric is not representative. To address this issue, some researchers enrich data contexts with external information and resources such as Wikipedia 37 and ontologies 38 .…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Most words only appear once in the texts that include them. Therefore, the term frequency‐inverse document frequency (TF‐IDF) 36 metric is not representative. To address this issue, some researchers enrich data contexts with external information and resources such as Wikipedia 37 and ontologies 38 .…”
Section: Related Workmentioning
confidence: 99%
“…Indeed, we need a numerical representation of the text to perform calculations. We have chosen to use the BoW 36 representation since it ignores the grammatical structure. The BoW model is utilized to transform the transaction descriptions into a representation better suited for machine learning.…”
Section: Sbm Processing Workflow and Modelsmentioning
confidence: 99%
See 2 more Smart Citations