2004
DOI: 10.1023/b:mach.0000035476.95130.99
|View full text |Cite
|
Sign up to set email alerts
|

A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000

Abstract: Abstract. The CoIL Challenge 2000 data mining competition attracted a wide variety of solutions, both in terms of approaches and performance. The goal of the competition was to predict who would be interested in buying a specific insurance product and to explain why people would buy. Unlike in most other competitions, the majority of participants provided a report describing the path to their solution. In this article we use the framework of bias-variance decomposition of error to analyze what caused the wide … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
46
0

Year Published

2005
2005
2015
2015

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 89 publications
(47 citation statements)
references
References 13 publications
1
46
0
Order By: Relevance
“…Before answering this question, we note that HLC models and observed BN models both have their pros and cons when it classification task is to identify a subset of 800 that contains as many mobile home policy owners as possible. As can be seen from the following The classification performance of M HLC ranks at Number 5 among the 43 entries to the CoIL Challenge 2000 contest [5], and it is not far from the performance of the best entry. This is impressive considering that no attempt was made to minimize classification error when learning M HLC .…”
Section: Probabilistic Modelingmentioning
confidence: 92%
See 1 more Smart Citation
“…Before answering this question, we note that HLC models and observed BN models both have their pros and cons when it classification task is to identify a subset of 800 that contains as many mobile home policy owners as possible. As can be seen from the following The classification performance of M HLC ranks at Number 5 among the 43 entries to the CoIL Challenge 2000 contest [5], and it is not far from the performance of the best entry. This is impressive considering that no attempt was made to minimize classification error when learning M HLC .…”
Section: Probabilistic Modelingmentioning
confidence: 92%
“…The CoIL Challenge 2000 data set [5] contains information on customers of a Dutch insurance company. The data consists of 86 variables, around half of which are about ownership of various insurance products.…”
Section: Introductionmentioning
confidence: 99%
“…Internet demographic information on internet users, 70 variables, 10k examples [12]. CoIL2000 insurance customer management, 86 variables, 6k examples [23]. MITFace face recognition dataset, discretized to 4 bins using equal frequency, 362 variables, 31k examples [17].…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…On the one hand, NB is a highbias, low variance classifier (see [8] for a theoretical analysis and [11] for an experimental one) and local learning reduces the bias; on the other hand, local learning reduces the chance of encountering strong dependencies between features [7]; there is in fact evidence that local naive Bayes can outperform global naive Bayes [15,7].…”
Section: Lazy Classifiersmentioning
confidence: 99%