2017
DOI: 10.14419/ijet.v7i1.1.10146
|View full text |Cite
|
Sign up to set email alerts
|

An novel cluster based feature selection and document classification model on high dimension trec data

Abstract: TREC text documents are complex to analyze the features its relevant similar documents using the traditional document similarity measures. As the size of the TREC repository is increasing, finding relevant clustered documents from a large collection of unstructured documents is a challenging task. Traditional document similarity and classification models are implemented on homogeneous TREC data to find essential features for document entities that are similar to the TREC documents. Also, most of the traditiona… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 12 publications
0
2
0
Order By: Relevance
“…The document clustering process organizes similar documents into clusters using statistical measures on frequencies of words, phrases, or sentences. 1 Based on some feature matrix, the document clustering methods group the documents to ensure that the documents within a cluster are close than those from other clusters. 2,3 In order to cluster textual data, the data must first be properly mapped to a vector space, that is, it is important to vectorize each text document and then cluster the vectors using clustering algorithms such as K-means.…”
Section: Introductionmentioning
confidence: 99%
“…The document clustering process organizes similar documents into clusters using statistical measures on frequencies of words, phrases, or sentences. 1 Based on some feature matrix, the document clustering methods group the documents to ensure that the documents within a cluster are close than those from other clusters. 2,3 In order to cluster textual data, the data must first be properly mapped to a vector space, that is, it is important to vectorize each text document and then cluster the vectors using clustering algorithms such as K-means.…”
Section: Introductionmentioning
confidence: 99%
“…Through hash-tags, it is possible for larger number of audience to view the tweets. All these properties of tweets are taken into consideration along with the regular textual features during feature selection [12][13][14][15] [16][17]. Twitter is the most popular microblogging site and hence generates enormous amounts of data suitable for opinion mining as compared to any other social media.…”
Section: Introductionmentioning
confidence: 99%