2012
DOI: 10.3923/jeasci.2012.342.347
|View full text |Cite
|
Sign up to set email alerts
|

Study of Ontology or Thesaurus Based Document Clustering and Information Retrieval

Abstract: Document clustering generates clusters from the whole document collection automatically and is used in many fields, including data mining and information retrieval. Clustering text data faces a number of new challenges. Among others, the volume of text data, dimensionality, sparsity and complex semantics are the most important ones. These characteristics of text data require clustering techniques to be scalable to large and high dimensional data, and able to handle sparsity and semantics. In the traditional ve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 10 publications
0
5
0
Order By: Relevance
“…The algorithm was executed with important step of synonym grouping and without this important step. The outcome of basic TF-IDF algorithm is also presented as a basis for comparison and also compared with the algorithm that extends term set by adding synonyms as suggested by Bharathi (Bharathi et al, 2012;Sedding et al, 2004). The resulted entropy in Table 4 shows significant improvement with grouping measure.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The algorithm was executed with important step of synonym grouping and without this important step. The outcome of basic TF-IDF algorithm is also presented as a basis for comparison and also compared with the algorithm that extends term set by adding synonyms as suggested by Bharathi (Bharathi et al, 2012;Sedding et al, 2004). The resulted entropy in Table 4 shows significant improvement with grouping measure.…”
Section: Methodsmentioning
confidence: 99%
“…One way to solve this problem is inclusion of synonyms and finding out hidden relationship between the documents. and G. Bharathi et al (2012) have given thought, in which synonyms are included and existing term set prepared by TF-IDF are enriched. This gives slightly improved results on some data sets.…”
Section: Similarity and Performance Measuresmentioning
confidence: 99%
“…Ontology approach clusters [19] the input text files based on idea and relations to express learning and goal without semantic vagueness. It centers around the semantics of the language structure which gives quality groups.…”
Section: B Ontological Approachmentioning
confidence: 99%
“…Some other studies are nevertheless closer to the scope of our work. Some researchers have for instance studied the impact of integrating knowledge base information in clustering algorithms [7] . To the best of our knowledge, Hotho et al [8][9][10] have been the first to consider this kind of approach.…”
Section: Semantic Clusteringmentioning
confidence: 99%