2007
DOI: 10.1007/s10791-007-9035-7
|View full text |Cite
|
Sign up to set email alerts
|

A new unsupervised method for document clustering by using WordNet lexical and conceptual relations

Abstract: Text document clustering provides an effective and intuitive navigation mechanism to organize a large amount of retrieval results by grouping documents in a small number of meaningful classes. Many well-known methods of text clustering make use of a long list of words as vector space which is often unsatisfactory for a couple of reasons: first, it keeps the dimensionality of the data very high, and second, it ignores important relationships between terms like synonyms or antonyms. Our unsupervised method solve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2008
2008
2019
2019

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 45 publications
(5 citation statements)
references
References 30 publications
0
5
0
Order By: Relevance
“…Recent work (Hotho et al, 2003;Sedding and Kazakov, 2004;Reforgiato Recupero, 2007), considers not only syntactic information, obtained from the terms present in a document, but also semantic relationships between terms. These approaches are mostly based on WordNet (Fellbaum, 1998), which is a lexical database that groups English words into sets of synonyms, called synsets.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Recent work (Hotho et al, 2003;Sedding and Kazakov, 2004;Reforgiato Recupero, 2007), considers not only syntactic information, obtained from the terms present in a document, but also semantic relationships between terms. These approaches are mostly based on WordNet (Fellbaum, 1998), which is a lexical database that groups English words into sets of synonyms, called synsets.…”
Section: Related Workmentioning
confidence: 99%
“…Concerning the clustering algorithm several approaches are followed in the literature. In (Hotho et al, 2003;Sedding and Kazakov, 2004;Reforgiato Recupero, 2007) a variant of the K-means, the Bi-Section-K-means is used, stating that this method frequently outperforms the standard K-means. In (Boyack et al, 2011) a more complex partitioning of the document collection is proposed.…”
Section: Related Workmentioning
confidence: 99%
“…The term expansion process consists of replacing terms of a document with a set of co-related terms. This procedure may be carried out in different ways, often by using an external knowledge resource which usually helps in obtaining successful results [46][47][48].…”
Section: Self-term Expansionmentioning
confidence: 99%
“…To do so, we use the dual document representationconcepts and terms-to create a generative language model for each concept, which bridges the gap between vocabulary terms and concepts. Related work has also used textual representations to represent concepts, see e.g., [1,11], however, there are two important differences. First, we use statistical language modeling techniques to parametrize the concept models, by leveraging the dual representation of the documents.…”
Section: Introductionmentioning
confidence: 99%