2009
DOI: 10.1007/s10791-009-9108-x
|View full text |Cite
|
Sign up to set email alerts
|

Document clustering of scientific texts using citation contexts

Abstract: Document clustering has many important applications in the area of data mining and information retrieval. Many existing document clustering techniques use the ''bag-of-words'' model to represent the content of a document. However, this representation is only effective for grouping related documents when these documents share a large proportion of lexically equivalent terms. In other words, instances of synonymy between related documents are ignored, which can reduce the effectiveness of applications using a st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
55
0
1

Year Published

2011
2011
2017
2017

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 70 publications
(56 citation statements)
references
References 50 publications
0
55
0
1
Order By: Relevance
“…It is common in document clustering literature to use tf-idf weighting, mostly with length normalization (Slonim and Tishby 2000;Steinbach et al 2000;Zhao and Karypis 2002;Xu et al 2003;Zhao and Karypis 2004;Hu et al 2009;Aljaber et al 2010 to name a few such works). There is some research from fields related to clustering, such as classification, that indicate that idf is an important part of feature weighting for those fields, while tf is (surprisingly) not as useful (Wilbur and Kim 2009).…”
Section: Introductionmentioning
confidence: 99%
“…It is common in document clustering literature to use tf-idf weighting, mostly with length normalization (Slonim and Tishby 2000;Steinbach et al 2000;Zhao and Karypis 2002;Xu et al 2003;Zhao and Karypis 2004;Hu et al 2009;Aljaber et al 2010 to name a few such works). There is some research from fields related to clustering, such as classification, that indicate that idf is an important part of feature weighting for those fields, while tf is (surprisingly) not as useful (Wilbur and Kim 2009).…”
Section: Introductionmentioning
confidence: 99%
“…One of the approaches that combine the proposed above representations using a reference context has been shown in Aljaber et al (2010). The authors use the text surrounding the references to extend the text representation for document clustering.…”
Section: Combined Representationmentioning
confidence: 99%
“…Reference [7] describe two flat clustering algorithms: the K-Means algorithm, an efficient and widely used document clustering method, and the expectationmaximization algorithm, which is computationally more expensive, but also more flexible. Reference [8] presented an approach for clustering scientific documents based on the utilization of citation contexts. Reference [9] gave a brief overview of the document clustering research and the developments in this field.…”
Section: Document Clustering For English and Other Languagesmentioning
confidence: 99%