2015
DOI: 10.1155/2015/695720
|View full text |Cite
|
Sign up to set email alerts
|

Distance Variance Score: An Efficient Feature Selection Method in Text Classification

Abstract: With the rapid development of web applications such as social network, a large amount of electric text data is accumulated and available on the Internet, which causes increasing interests in text mining. Text classification is one of the most important subfields of text mining. In fact, text documents are often represented as a high-dimensional sparse document term matrix (DTM) before classification. Feature selection is essential and vital for text classification due to high dimensionality and sparsity of DTM… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
9
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 28 publications
0
9
0
Order By: Relevance
“…The Laplacian technique simply sharpens an image by emphasizing the high gradient information in an image [26]. In addition, the Laplacian score is proved discriminative in the application of several areas [27]. The Laplacian score for blur detection is used as the examination criteria in this study.…”
Section: Blur Detection and Laplace Operatormentioning
confidence: 99%
“…The Laplacian technique simply sharpens an image by emphasizing the high gradient information in an image [26]. In addition, the Laplacian score is proved discriminative in the application of several areas [27]. The Laplacian score for blur detection is used as the examination criteria in this study.…”
Section: Blur Detection and Laplace Operatormentioning
confidence: 99%
“…The removal of textual elements that – in isolation – add little value to the information may occur in the following manner: Stage 2.1: Remove punctuation; Stage 2.2: Remove numbers; Stage 2.3: Remove accents; Stage 2.4: Remove special characters; Stage 2.5: Transform all words to lower case; Stage 2.6: Remove white spaces; Stage 2.7: Remove stop words; Stage 2.8: Remove custom stop words; and Stage 2.9: Synonyms. Stage 3: Stemming or Lemmatisation (Optional) : according to Soares et al (2008); Bezerra and Guimarães (2014), in this stage, the number of tokens is decreased by extracting the suffixes and prefixes that form each token, i.e. a linguistic standardisation occurs, in which the variant forms of a term are reduced to a common form called a stem. Stage 4: Frequency matrix : according to Fan et al (2006); Feldman and Sanger, 2007; Bezerra and Guimarães, 2014; Wang and Hong, 2015), this stage entails categorising stems and associating them with their respective frequency of occurrence in the texts analysed, thereby enabling inferences about their proximities, distances, synonyms and related terms.…”
Section: Theoretical Reviewmentioning
confidence: 99%
“…Stage 4: Frequency matrix : according to Fan et al (2006); Feldman and Sanger, 2007; Bezerra and Guimarães, 2014; Wang and Hong, 2015), this stage entails categorising stems and associating them with their respective frequency of occurrence in the texts analysed, thereby enabling inferences about their proximities, distances, synonyms and related terms.…”
Section: Theoretical Reviewmentioning
confidence: 99%
“…In recent years, with the development of science and technology, natural language processing (NLP) has been greatly developed due to its extensive application scenarios [1][2][3][4]. Text classification is an important branch of NLP.…”
Section: Introductionmentioning
confidence: 99%