2008
DOI: 10.4018/jdwm.2008040104
|View full text |Cite
|
Sign up to set email alerts
|

The Power of Sampling and Stacking for the PaKDD-2007 Cross-Selling Problem

Abstract: This article presents an efficient solution for the PAKDD-2007 Competition cross-selling problem. The solution is based on a thorough approach which involves the creation of new input variables, efficient data preparation and transformation, adequate data sampling strategy and a combination of two of the most robust modeling techniques. Due to the complexity imposed by the very small amount of examples in the target class, the approach for model robustness was to produce the median score of the 11 models devel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0
2

Year Published

2009
2009
2020
2020

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 17 publications
(9 citation statements)
references
References 13 publications
0
7
0
2
Order By: Relevance
“…She concluded that MLP neural networks were the most robust against performance degradation. More recently, several applications have confirmed the robustness of these neural networks on real-world problems [8,9].…”
Section: Feature Extractionmentioning
confidence: 94%
See 2 more Smart Citations
“…She concluded that MLP neural networks were the most robust against performance degradation. More recently, several applications have confirmed the robustness of these neural networks on real-world problems [8,9].…”
Section: Feature Extractionmentioning
confidence: 94%
“…It has also been successfully used in data mining applications either isolate [17] or in ensembles [8,9]. However, one drawback of this technique is the need of a validation (holdout) data set for preventing over fitting, when there are not much labeled data available.…”
Section: Feature Extractionmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, this paper will use the area under the ROC curve (AUC_ROC) [Provost and Fawcett, 2001], one the most widely accepted performance metrics for binary classifiers. This paper will also use the maximum Kolmogorov-Smirnov distance (Max_KS2) as a dissimilarity metrics [Conover, 1999] to assess continuous score classifier output, as commonly done in financial applications [Adeodato et al, 2008].…”
Section: Performance Metricsmentioning
confidence: 99%
“…Three years ago, Adeodato et al [12] have successfully applied a modified version of the n-tuple classifier to a real problem in the PAKDD 2007 data mining competition. That system has evolved and this paper presents the updated version ofthis approach which combines the features of: 1. an architecture similar to the n-tuple classifier, 2. the pRAM neuron with a slightly different recall mode and 3. additive Gaussian noise training, without any constraints for classification problems.…”
mentioning
confidence: 99%