2009
DOI: 10.1007/s10618-009-0158-x
|View full text |Cite
|
Sign up to set email alerts
|

Medical data mining: insights from winning two competitions

Abstract: Two major data mining competitions in 2008 presented challenges in medical domains: KDD Cup 2008, which concerned cancer detection from mammography data; and Informs Data Mining Challenge 2008, dealing with diagnosis of pneumonia based on patient information from hospital files. Our team won both of these competitions, and in this paper we share our lessons learned and insights. We emphasize the aspects that pertain to the general practice and methodology of medical data mining, rather than to the specifics of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
26
0

Year Published

2010
2010
2024
2024

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 44 publications
(27 citation statements)
references
References 18 publications
0
26
0
Order By: Relevance
“…In contrast to the usual Kaggle contest in which the withheld data in the test set are already recorded, the NCAA contest involved predicting the future. Thus the usual concerns with leakage in data prediction contests (Rosset et al 2010;Kaufman et al 2012) did not exist for the NCAA tournament prediction Kaggle competition.…”
Section: Mark E Glickman* and Jeff Sonasmentioning
confidence: 99%
“…In contrast to the usual Kaggle contest in which the withheld data in the test set are already recorded, the NCAA contest involved predicting the future. Thus the usual concerns with leakage in data prediction contests (Rosset et al 2010;Kaufman et al 2012) did not exist for the NCAA tournament prediction Kaggle competition.…”
Section: Mark E Glickman* and Jeff Sonasmentioning
confidence: 99%
“…Two medical data mining contests held the following year and which also exhibited leakage are discussed in Perlich et al [2008] and Rosset et al [2010]. KDD Cup 2008 dealt with cancer detection from mammography data.…”
Section: Figmentioning
confidence: 99%
“…Data mining is a process and methodology for applying tools and techniques that explore and analyze a large number of data to discover meaningful patterns and rules. It is applicable to a wide variety of fields, from business to medicine and engineering (Rosset et al, 2010;Cheng et al, 2010). Hsu (2009) applied a data mining framework by using anthropometric data to develop industrial standards for adult females.…”
Section: Introductionmentioning
confidence: 99%