Determining WWW user's next access and its application to pre-fetching

Cunha, Carlos R.; Jaccoud, C.F.B.

doi:10.1109/iscc.1997.615962

Cited by 46 publications

(30 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The problem of finding the MPESs, the most difficult part of the CES discovery process, maps nicely to sequence mining discovery [5,11,8,22,24] which is a standard approach for finding sequential patterns in a dataset. Web usage mining (e.g., [8,22,26,18,23]) is a sub-area within the area of sequence mining particularly applicable to the MPES-discovery problem.…”

Section: Related Workmentioning

confidence: 99%

A fast method for discovering critical edge sequences in e-commerce catalogs

Dutta

VanderMeer

Datta

et al. 2007

European Journal of Operational Research

View full text Add to dashboard Cite

Web sites allow the collection of vast amounts of navigational data -clickstreams of user traversals through the site. These massive data stores offer the tantalizing possibility of uncovering interesting patterns within the dataset. For e-businesses, always looking for an edge in the hypercompetitive online marketplace, the discovery of Critical Edge Sequences (CESs), which denote frequently traversed sequences in the catalog, is of significant interest. CESs can be used to improve site performance and site management, increase the effectiveness of advertising on the site, and gather additional knowledge of customer behavior patterns on the site.Using web mining strategies to find CESs turns out to be expensive in both space and time. In this paper, we propose an approximate algorithm to compute the most popular traversal sequences between node pairs in a catalog, which are then used to discover CESs. Our method is both fast and space efficient, providing a vast reduction in both the run time and storage requirements, with minimum impact on accuracy.

show abstract

Section: Related Workmentioning

confidence: 99%

A fast method for discovering critical edge sequences in e-commerce catalogs

Dutta

VanderMeer

Datta

et al. 2007

European Journal of Operational Research

View full text Add to dashboard Cite

show abstract

“…Precision, defined in this way, just evaluates the algorithm without considering physical system restrictions; e.g., cache, network or time restrictions; therefore, it can be seen as a theoretical index. Other research studies refer to this index as Accuracy [4], [12], [13], [14], [15], [16], [17] while some others use a probabilistic notation; e.g., some Markov chains based models like [18] Pr(hit|match).…”

Section: Generic Indexes Precision (Pc)mentioning

confidence: 99%

“…Precision [8], [9], [10], [11] Accuracy [4], [12], [13], [14], [15], [16], [17] Precision Pr(hit|match) [18] Recall [9], [10], [11] Usefulness [16], [17] Hit Ratio [19] Recall Predictability [6] 1. Prediction Applicability Applicability [8] Traffic increase [3], [4], [19] Wasted Bandwidth [2] Bandwidth ratio [21] Extra bytes [10] Data transfer [22] Traffic increase Latency [5], [23] Access time [3] 3.…”

Section: Referencesmentioning

confidence: 99%

About the Heterogeneity of Web Prefetching Performance Key Metrics

Domènech

Sahuquillo

Gil

et al. 2004

Intelligence in Communication Systems

View full text Add to dashboard Cite

Abstract. Web prefetching techniques have pointed to be especially important to reduce web latencies and, consequently, an important set of works can be found in the open literature. But, in general, it is not possible to do a fair comparison among the proposed prefetching techniques due to three main reasons: i) the underlying baseline system where prefetching is applied differs widely among the studies; ii) the workload used in the presented experiments is not the same; iii) different performance key metrics are used to evaluate their benefits.This paper focuses on the third reason. Our main concern is to identify which the main meaningful indexes are when studying the performance of different prefetching techniques. For this purpose, we propose a taxonomy based in three categories, which permits us to identify analogies and differences among the indexes commonly used. In order to check, in a more formal way, the relation between them, we run experiments and estimate statistically the correlation among a representative subset of those metrics. The statistical results help us to suggest which indexes should be selected when performing evaluation studies depending on the different elements in the considered web architecture.The choice of the appropriate key metric is of paramount importance for a correct and representative study. As our experimental results show, depending on the metric used to check the system performance, results can not only widely vary but also reach opposite conclusions.

show abstract

“…Cunha and Jaccoud (1997) studied the problem of determining a user's next page accessed in a session for the purpose of determining how to pre-fetch pages and optimize a website's performance. The problem was modeled as a Markov process in which the next page accessed depends on the most recent sliding window.…”

Section: Inputsmentioning

confidence: 99%

“…Sliding window of fixed length w (Cunha and Jaccoud 1997. A single session of length n is broken down into n − w +1 sliding windows of length w. Hence, the implicit unit of analysis is a group of w consecutive clicks within a session.…”

Section: Introductionmentioning

confidence: 99%

On the Existence and Significance of Data Preprocessing Biases in Web-Usage Mining

Zheng

Padmanabhan

Kimbrough

2003

INFORMS Journal on Computing

View full text Add to dashboard Cite

T he literature on web-usage mining is replete with data preprocessing techniques, which correspond to many closely related problem formulations. We survey datapreprocessing techniques for session-level pattern discovery and compare three of these techniques in the context of understanding session-level purchase behavior on the web. Using real data collected from 20,000 users' browsing behavior over a period of six months, four different models (linear regressions, logistic regressions, neural networks, and classification trees) are built based on data preprocessed using three different techniques. The results demonstrate that the three approaches result in radically different conclusions and provide initial evidence that a data preprocessing bias exists, the effect of which can be significant. (Information Systems; Analysis and Design; Decision Support Systems)

show abstract

Determining WWW user's next access and its application to pre-fetching

Cited by 46 publications

References 15 publications

A fast method for discovering critical edge sequences in e-commerce catalogs

A fast method for discovering critical edge sequences in e-commerce catalogs

About the Heterogeneity of Web Prefetching Performance Key Metrics

On the Existence and Significance of Data Preprocessing Biases in Web-Usage Mining

Contact Info

Product

Resources

About