2006
DOI: 10.28945/916
|View full text |Cite
|
Sign up to set email alerts
|

Advanced Data Clustering Methods of Mining Web Documents

Abstract: The aim of this paper is to evaluate, propose and improve the use of advanced web data clustering techniques, allowing data analysts to conduct more efficient execution of large-scale web data searches. Increasing the efficiency of this search process requires a detailed knowledge of abstract categories, pattern matching techniques, and their relationship to search engine speed.In this paper we compare several alternative advanced techniques of data clustering in creation of abstract categories for these algor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2010
2010
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(6 citation statements)
references
References 3 publications
0
5
0
Order By: Relevance
“…In contrast to [19], we do not cluster the content with the commonly used k-means approach but rather using Suffix Tree Clustering (STC) [21], an approach that focuses on the problem of cluster labeling. We justify our choice since this clustering technique merges base clusters with high textual overlaps and was shown to outperform group average agglomerative hierarchical clustering, k-means, buckshot, fractionation and single-pass algorithms [21,14].…”
Section: Methodsmentioning
confidence: 99%
“…In contrast to [19], we do not cluster the content with the commonly used k-means approach but rather using Suffix Tree Clustering (STC) [21], an approach that focuses on the problem of cluster labeling. We justify our choice since this clustering technique merges base clusters with high textual overlaps and was shown to outperform group average agglomerative hierarchical clustering, k-means, buckshot, fractionation and single-pass algorithms [21,14].…”
Section: Methodsmentioning
confidence: 99%
“…There are many ways to cluster web pages before finding patterns. The most common method is the K-means algorithm but there are several more like Single pass, Fractionation, Buckshot, Suffix tree and Apriori All, which are described in [Sambasivan et al 2006]. In [Sambasivan et al 2006] they also measure the execution time of the algorithms.…”
Section: Clusteringmentioning
confidence: 99%
“…The most common method is the K-means algorithm but there are several more like Single pass, Fractionation, Buckshot, Suffix tree and Apriori All, which are described in [Sambasivan et al 2006]. In [Sambasivan et al 2006] they also measure the execution time of the algorithms. Common ways to gain attributes from web pages are to take specific keywords and comparing their relevance to the rest of the text or excerpts of the web page.…”
Section: Clusteringmentioning
confidence: 99%
“…K-means clustering method is a typical kind of the clustering algorithm based on distance [9]. Its input X={x 1 ,x 2 ,…, x n }, and classification number is k. The output is k data type C j ,j=1,2,…,k.…”
Section: The Wavelet De-noising Threshold Generation Based On K-meansmentioning
confidence: 99%