2016
DOI: 10.1186/s12919-016-0020-2
|View full text |Cite
|
Sign up to set email alerts
|

Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data

Abstract: Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vect… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
30
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 29 publications
(30 citation statements)
references
References 7 publications
0
30
0
Order By: Relevance
“…In a more direct way, Held et al [ 14 ] built support vector machine models to predict disease status from genes that simultaneously collapse genotype variants and use gene expression effects. Specifically, based on 637 individuals with a simulated hypertension phenotype, some or all of the first 150 simulated data sets were used for a selection of interesting genes (training), and three from the remaining 50 simulated data sets were used for classification (testing).…”
Section: Message #2: Exploiting the Information From Different Data Tmentioning
confidence: 99%
See 3 more Smart Citations
“…In a more direct way, Held et al [ 14 ] built support vector machine models to predict disease status from genes that simultaneously collapse genotype variants and use gene expression effects. Specifically, based on 637 individuals with a simulated hypertension phenotype, some or all of the first 150 simulated data sets were used for a selection of interesting genes (training), and three from the remaining 50 simulated data sets were used for classification (testing).…”
Section: Message #2: Exploiting the Information From Different Data Tmentioning
confidence: 99%
“…The required hyperparameters are derived from cross-validation. Held et al [ 14 ] find that the predictive performance is slightly higher for a support vector machine with a linear kernel than for the other methods. With logistic regression and use of a radial kernel, the performance decreases with a greater number of genes.…”
Section: Message #2: Exploiting the Information From Different Data Tmentioning
confidence: 99%
See 2 more Smart Citations
“…At the second stage, SoftMax regression was applied to classify the health status of individuals using the learned features. Brain images have been obtained under various health conditions [24]. These images have constructed a (6)…”
Section: Proposed Methodsmentioning
confidence: 99%