2019
DOI: 10.1520/jte20180497
|View full text |Cite
|
Sign up to set email alerts
|

Link-Based Clustering Algorithm for Clustering Web Documents

Abstract: Clustering web documents involves the use of a large amount of words to be inputted to clustering algorithms such as K-Means, Cosine Similarity, Latent Discelet Allocation, and so on. This causes the clustering process to consume much time as the number of words in each document increases. In many web documents, web links are available along with the contents; these web link texts may contain a tremendous amount of information for clustering. In our work, we show that just using the web link text alone gives b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 23 publications
0
4
0
Order By: Relevance
“…Data resampling can cause important instances to be lost forever and often leads to oversampling, a work by [22] focuses on gaining advantages of both data level and the ensemble of classifiers. They apply a few pre-processing steps to the training phase of each classifier and compare them using eight datasets.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Data resampling can cause important instances to be lost forever and often leads to oversampling, a work by [22] focuses on gaining advantages of both data level and the ensemble of classifiers. They apply a few pre-processing steps to the training phase of each classifier and compare them using eight datasets.…”
Section: Related Workmentioning
confidence: 99%
“…As many methods work on alternating the original dataset [24], a research work proposed by [22] aims to develop a balanced dataset from an imbalanced dataset and perform an ensemble to consolidate the result. This process prevents important data to be lost in the classification.…”
Section: Related Workmentioning
confidence: 99%
“…Good features need to be identified to separate the classes. As the number of features increases, the complexity of the classifier is also increased; this creates a need for better feature selection methods [34].…”
Section: Overall Drawbacks In Existing Feature Selectionmentioning
confidence: 99%
“…Let us say, there is a vehicle theft at a place X, that means X has less security for monitoring the crime Y ; hence, the same area or the surrounding areas are too likely to become a vulnerable point. Link-based algorithms such as [21] will be helpful in creating a graph; the latter kNN algorithm can easily predict the crime spots. Figure 4 shows a visualization of crime in San Francisco; the visualization shows which areas are vulnerable and lack security monitoring.…”
Section: Vulnerability Analysismentioning
confidence: 99%