2007
DOI: 10.1002/gepi.20280
|View full text |Cite
|
Sign up to set email alerts
|

Data mining, neural nets, trees — Problems 2 and 3 of Genetic Analysis Workshop 15

Abstract: Genome-wide association studies using thousands to hundreds of thousands of single nucleotide polymorphism (SNP) markers and region-wide association studies using a dense panel of SNPs are already in use to identify disease susceptibility genes and to predict disease risk in individuals. Because these tasks become increasingly important, three different data sets were provided for the Genetic Analysis Workshop 15, thus allowing examination of various novel and existing data mining methods for both classificati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
31
0

Year Published

2007
2007
2012
2012

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 28 publications
(31 citation statements)
references
References 39 publications
0
31
0
Order By: Relevance
“…Brinza et al [personal communication] applied a complimentary greedy search algorithm, a machine learning approach developed by his group [Brinza and Zelikovsky, 2006], to identify SNPs associated with RA, and also to predict susceptibility of the tested genotype to RA subjects using the Problem 2 data set. Their paper is discussed in more detail in the Group 6 summary paper [Ziegler et al, 2007]. Qin et al [2007] developed a graphical display tool called SIMLA-PLOT for visualizing different ways in which continuous covariates may influence the genotypespecific risk for complex human diseases.…”
Section: Single Snp and Haplotype Analysesmentioning
confidence: 99%
“…Brinza et al [personal communication] applied a complimentary greedy search algorithm, a machine learning approach developed by his group [Brinza and Zelikovsky, 2006], to identify SNPs associated with RA, and also to predict susceptibility of the tested genotype to RA subjects using the Problem 2 data set. Their paper is discussed in more detail in the Group 6 summary paper [Ziegler et al, 2007]. Qin et al [2007] developed a graphical display tool called SIMLA-PLOT for visualizing different ways in which continuous covariates may influence the genotypespecific risk for complex human diseases.…”
Section: Single Snp and Haplotype Analysesmentioning
confidence: 99%
“…Applications of random forests in medical research have mostly focused on the classification of genetic data (e.g., Schwarz et al, 2007;Schwender et al, 2004; for an overview, see Ziegler et al (2007)). As the name implies, the basic units of this method are trees, and it utilises a combination of manipulating the training cases together with introducing an additional element of randomness.…”
Section: Random Forestsmentioning
confidence: 99%
“…The second importance measure is a generalisation of the Gini index from a single tree to a forest. The basic idea of this importance measure is to contrast the impurity of a tree with and without the feature of interest being included in the tree; for details, see Ziegler et al (2007). If the estimated importance of all features can be assumed to be independent from tree to tree, a standard error of the importance can be computed in a usual way so that asymptotic confidence intervals assuming normality can be calculated (Lin et al, 2004).…”
Section: Random Forestsmentioning
confidence: 99%
See 1 more Smart Citation
“…Although the approaches taken and goals proposed are very different, there are common themes in the approaches as well as a remarkable level of confirmation of some results. Although such important techniques as random forest [Breiman, 2001], boosting [Schapire, 1990] and ensemble approaches [Dietterich, 2000] used extensively in the data mining analyses for the other problems [Ziegler et al, 2007], these techniques were not applied in this group. Table I summarizes the 13 papers, indicating common themes among many of them.…”
mentioning
confidence: 99%