The continuous growth of the World Wide Web has led to the problem of long access delays. To reduce this delay, prefetching techniques have been used to predict the users’ browsing behavior to fetch the web pages before the user explicitly demands that web page. To make near accurate predictions for users’ search behavior is a complex task faced by researchers for many years. For this, various web mining techniques have been used. However, it is observed that either of the methods has its own set of drawbacks. In this paper, a novel approach has been proposed to make a hybrid prediction model that integrates usage mining and content mining techniques to tackle the individual challenges of both these approaches. The proposed method uses N-gram parsing along with the click count of the queries to capture more contextual information as an effort to improve the prediction of web pages. Evaluation of the proposed hybrid approach has been done by using AOL search logs, which shows a 26% increase in precision of prediction and a 10% increase in hit ratio on average as compared to other mining techniques.
Background:
Clustering is one of the important techniques in Data Mining to group the related data. Clustering can be applied on numerical data as well as web objects such as URLs, websites, documents, keywords etc. which is
the building block for many recommender systems as well as prediction models.
Objective:
The objective of this research article is to develop an optimal clustering approach which considers semantics of
web objects to cluster them in a group. More so importantly, the purpose of the proposed work is to strictly improve the
computation time of clustering process.
Methods:
In order to achieve the desired objectives, following two contributions have been proposed to improve the clustering approach 1) Semantic Similarity Measure based on Wu-Palmer Semantics based similarity 2). Two-Level Densitybased Clustering technique to reduce the computational complexity of density based clustering approach.
Results:
The efficacy of the proposed method has been analyzed on AOL search logs containing 20 million web queries.
The results showed that our approach increases the F-measure, and decreases the entropy. It also reduces the computational complexity and provides a competitive alternative strategy of semantic clustering when conventional methods do
not provide helpful suggestions.
Conclusion:
A clustering model has been proposed which is composed of two components i.e. Similarity measure and
Density based two-level clustering technique. The proposed model reduced the time cost of density based clustering approach without effecting the performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.