2014
DOI: 10.1002/bimj.201300077
|View full text |Cite|
|
Sign up to set email alerts
|

Probability estimation with machine learning methods for dichotomous and multicategory outcome: Applications

Abstract: Machine learning methods are applied to three different large datasets, all dealing with probability estimation problems for dichotomous or multicategory data. Specifically, we investigate k-nearest neighbors, bagged nearest neighbors, random forests for probability estimation trees, and support vector machines with the kernels of Bessel, linear, Laplacian, and radial basis type. Comparisons are made with logistic regression. The dataset from the German Stroke Study Collaboration with dichotomous and three-cat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
68
0

Year Published

2014
2014
2017
2017

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 49 publications
(69 citation statements)
references
References 50 publications
1
68
0
Order By: Relevance
“…First, the R packages randomForest (Liaw and Wiener 2002), randomForestSRC (Ishwaran and Kogalur 2015) and Rborist (Seligman 2015), the C++ application Random Jungle (Schwarz et al 2010;Kruppa et al 2014b), and the R version of the new implementation ranger were run with small simulated datasets, a varying number of features p, sample size n, number of features tried for splitting (mtry) and a varying number of trees grown in the RF. In each case, the other three parameters were kept fixed to 500 trees, 1,000 samples, 1,000 features and mtry = √ p. The datasets mimic genetic data, consisting of p single nucleotide polymorphisms (SNPs) measured on n subjects.…”
Section: Runtime and Memory Usagementioning
confidence: 99%
See 2 more Smart Citations
“…First, the R packages randomForest (Liaw and Wiener 2002), randomForestSRC (Ishwaran and Kogalur 2015) and Rborist (Seligman 2015), the C++ application Random Jungle (Schwarz et al 2010;Kruppa et al 2014b), and the R version of the new implementation ranger were run with small simulated datasets, a varying number of features p, sample size n, number of features tried for splitting (mtry) and a varying number of trees grown in the RF. In each case, the other three parameters were kept fixed to 500 trees, 1,000 samples, 1,000 features and mtry = √ p. The datasets mimic genetic data, consisting of p single nucleotide polymorphisms (SNPs) measured on n subjects.…”
Section: Runtime and Memory Usagementioning
confidence: 99%
“…This package is studied in greater detail in Section 5. Finally, an RF implementation optimized for analyzing high dimensional data is Random Jungle (Schwarz et al 2010;Kruppa et al 2014b). This package is only available as C++ application with library dependencies, and it is not portable to R or another statistical programming language.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…It is usually required by surgeons, oncologists, pathologists, professionals involved in internal medicine and human genetics and pediatricians (Malley et al (2012)). For instance, carrier probabilities are calculated in genetic counseling and treatment response probability is estimated in personalized medicine of every patient , Kruppa et al (2014b)). …”
Section: Introductionmentioning
confidence: 99%
“…For example, in safety-critical domains such as surgery, oncology, internal medicine, pathology, paediatrics and human genetics, these probabilities are needed. In all the aforementioned areas, probability estimates are more useful than simple classification as they provide a measure of reliability of the decision taken about an individual (Lee et al (2010), Malley et al (2012), Kruppa et al (2012), Kruppa et al (2014aKruppa et al ( , 2014b). Machine learning techniques used mainly for classification can be used as non-parametric methods for class membership probability estimation in order to avoid the assumptions imposed in parametric models used for the estimation of these probabilities , Malley et al (2012)).…”
Section: Introductionmentioning
confidence: 99%