2018
DOI: 10.1186/s12859-018-2054-0
|View full text |Cite
|
Sign up to set email alerts
|

A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies

Abstract: BackgroundGenome-wide association studies (GWASs) have been widely used to discover the genetic basis of complex phenotypes. However, standard single-SNP GWASs suffer from lack of power. In particular, they do not directly account for linkage disequilibrium, that is the dependences between SNPs (Single Nucleotide Polymorphisms).ResultsWe present the comparative study of two multilocus GWAS strategies, in the random forest-based framework. The first method, T-Trees, was designed by Botta and collaborators (Bott… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 42 publications
0
3
0
Order By: Relevance
“…Boosting algorithms have been used successfully to solve the case of genome-wide data sets due to linkage disequilibrium. Another comparison study that used genetic data has been done by [13]. The new variant Decision Tree (DT) has been proposed and compared with Random forest using Genome-wide association studies (GWASs) dataset to discover the genetic basis of complex phenotypes.…”
Section: Review Machine Learning Methods For Diabetes Predictionmentioning
confidence: 99%
“…Boosting algorithms have been used successfully to solve the case of genome-wide data sets due to linkage disequilibrium. Another comparison study that used genetic data has been done by [13]. The new variant Decision Tree (DT) has been proposed and compared with Random forest using Genome-wide association studies (GWASs) dataset to discover the genetic basis of complex phenotypes.…”
Section: Review Machine Learning Methods For Diabetes Predictionmentioning
confidence: 99%
“…It was a part of the hybrid approach to defining latent variables of a Bayesian network to discover the genetic bases of complex phenotypes. As a result, the GWAS approach utilizing DBSCAN to model LD outperformed the approach based on traditional LD modeling through blocks of contiguous SNPs [23]. This investigation forced us to consider density-based spatial algorithms as promising for the purpose of our research.…”
Section: Introductionmentioning
confidence: 99%
“…To address this issue effectively, some improved multi-locus GWAS methods were proposed. For example, multi-locus mixed-model (MLMM) [26] adopts stepwise mixed-model regression with forward inclusion and backward elimination using a Bayesian approach and performs well when the structure is complex, fixed and random model circulating probability unification (FarmCPU) [27] incorporates multiple markers simultaneously as covariates in a stepwise LMM to partially remove the confounding between testing markers and kinship, iterative nonlocal prior-based selection (GWASinlps) [28] considers an iterative structured screen-and-select strategy and nonlocal priors within it and provides an efficient and parsimonious variable selection for continuous phenotypes, a machine-learning method combines a random forest-based technique with the modeling of linkage disequilibrium through latent variables [29] and accelerates the computing speed for multi-locus GWAS, a gene set analysis with Generalized Berk-Jones (GBJ) statistic [30] introduces a permutation-free parametric framework, which can increase the power by incorporating information from multiple signals in the same gene, the SNP set GWAS approach RAINBOW [31] achieves faster computation by using linear kernel for constructing the Gram matrix of the SNP set of interest, and the multi-locus random SNP effect mixed linear model (mrMLM) [32] uses the Wald test based on a random SNP effect linear mixed model to reduce dimensionality; then, all the selected markers are placed into an empirical Bayes [33] multi-locus model, showing the advantage in controlling a complex population structure. A limitation of Bayesian method is that Markov Chain Monte Carlo (MCMC) sampling comes at the cost of intensive computation, or the posterior distribution of fitness is not easy to calculate [34].…”
Section: Introductionmentioning
confidence: 99%