2003
DOI: 10.1287/ijoc.15.2.148.14449
|View full text |Cite
|
Sign up to set email alerts
|

On the Existence and Significance of Data Preprocessing Biases in Web-Usage Mining

Abstract: T he literature on web-usage mining is replete with data preprocessing techniques, which correspond to many closely related problem formulations. We survey datapreprocessing techniques for session-level pattern discovery and compare three of these techniques in the context of understanding session-level purchase behavior on the web. Using real data collected from 20,000 users' browsing behavior over a period of six months, four different models (linear regressions, logistic regressions, neural networks, and cl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2003
2003
2018
2018

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(12 citation statements)
references
References 22 publications
0
12
0
Order By: Relevance
“…Other data reduction and pre-processing techniques are discussed in Han and Kamber (2006). Zheng et al (2003) study some popular data-reduction methods and show that they are not without their pitfalls since different methods can lead to drastically different results, both in terms of characterizing the original data and out-of-sample predictions. In the analysis that follows, we show that our method of aggregation has no such problems (at least within the broad scope of our analysis).…”
Section: Related Literaturementioning
confidence: 99%
“…Other data reduction and pre-processing techniques are discussed in Han and Kamber (2006). Zheng et al (2003) study some popular data-reduction methods and show that they are not without their pitfalls since different methods can lead to drastically different results, both in terms of characterizing the original data and out-of-sample predictions. In the analysis that follows, we show that our method of aggregation has no such problems (at least within the broad scope of our analysis).…”
Section: Related Literaturementioning
confidence: 99%
“…Padmanabhan, Zheng, and Kimbrough have studied the impact of data preparation alternatives upon Web-usage mining in Padmanabhan et al (2001) and in Zheng et al (2003). In Padmanabhan et al (2001), they focus on the prediction of purchase for users visiting multiple sites.…”
Section: Spiliopoulou Mobasher Berendt and Nakagawa A Framework Fomentioning
confidence: 99%
“…The authors show that when the analysis is based on the activities inside one site only, the accuracy of the predictors drops significantly. Zheng et al (2003) compare a set of methods for purchase prediction, each of which exploits different components of the users' sessions on which to make predictions. They compute the prediction accuracy of these methods using several classifiers.…”
Section: Spiliopoulou Mobasher Berendt and Nakagawa A Framework Fomentioning
confidence: 99%
See 1 more Smart Citation
“…The second eCRM problem, primarily in the DM domain, pertains to preprocessing click-stream data as the basis for building DM models, such as purchase prediction models. Zheng et al (2003) show that inappropriate preprocessing of data can result in significantly worse DM models for critical eCRM problems. Given the nature of click-stream data and the fact that hundreds of derived variables can be created from this data for a user session, it is important to partition…”
Section: Introductionmentioning
confidence: 99%