2018
DOI: 10.3390/ijms19082267
|View full text |Cite
|
Sign up to set email alerts
|

ClusterMI: Detecting High-Order SNP Interactions Based on Clustering and Mutual Information

Abstract: Identifying single nucleotide polymorphism (SNP) interactions is considered as a popular and crucial way for explaining the missing heritability of complex diseases in genome-wide association studies (GWAS). Many approaches have been proposed to detect SNP interactions. However, existing approaches generally suffer from the high computational complexity resulting from the explosion of candidate high-order interactions. In this paper, we propose a two-stage approach (called ClusterMI) to detect high-order genom… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 19 publications
(11 citation statements)
references
References 52 publications
0
11
0
Order By: Relevance
“…For the three‐locus study, we do not take MOMDR as a comparing method for its overwhelming computational burden. Instead, we add two specifically designed high‐order epistasis approaches (ClusterMI, Cao et al, ; HiSeeker, Liu et al, ) for experiments. For a fair comparison, ClusterMI and HiSeeker employ the exhaustive search to detect epistatic interactions in the search stage.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…For the three‐locus study, we do not take MOMDR as a comparing method for its overwhelming computational burden. Instead, we add two specifically designed high‐order epistasis approaches (ClusterMI, Cao et al, ; HiSeeker, Liu et al, ) for experiments. For a fair comparison, ClusterMI and HiSeeker employ the exhaustive search to detect epistatic interactions in the search stage.…”
Section: Resultsmentioning
confidence: 99%
“…Filter and wrapper based feature selection methodologies are two common screening methods (Long, Gianola, Rosa, Weigel, & Avendao, 2007;Saeys, Inza, & Larrañaga, 2007). Filter-based methods aim to select a subset of SNPs as a candidate set for interaction tests on the basis of existing biological knowledge (i.e., databases of pathways and protein-protein interactions; Ritchie, 2011;Turner et al, 2011), statistical features (i.e., marginal effects; Ma et al, 2012; and genotype frequencies; Ackermann & Beyer, 2012;Guo, Meng, Yu, & Pan, 2014;Xie et al, 2011) or fast algorithms (Cao, Yu, Liu, Jia, & Wang, 2018;Liu, Yu, Jiang, & Wang, 2017;Yang et al, 2008). Wrapper-based methods apply random sampling procedures (i.e., Markov chain Monte Carlo, MCMC; Zhang & Liu, 2007) and the Gibbs sampling; Tang, Wu, Jiang, & Li, 2009), heuristic algorithms (i.e., ant colony optimization, ACO; Sapin, Keedwell, & Frayling, 2015;Wang, Liu, Robbins, & Rekaya, 2010) and differential evolution, DE; C.-H. Yang et al, 2017) or machine learning algorithm (i.e., random forest, RF; Schwarz, König, & Ziegler, 2010;Yoshida & Koike, 2011), support vector machine (SVM; Chen et al, 2008;Marvel & Motsinger-Reif, 2012) and neural network (NN; Uppu et al, 2016) to search the space of interactions.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…An effective objective function has the ability to guide the SIS algorithm to explore some clues (such as different distributions of genotypes) that can further lead the algorithm to find high-order SNP interactions on a genome-wide scale. In existing research, the most common evaluation criteria (objective functions) involve the Bayesian-network-based score [58]- [60], mutual information [61], [62], logistic-regression-based score [63], [64], MDR [65], [66], Gini index [67]- [70], statistical test methods (e.g., chi-square test, G-test, and t-test) etc. These criteria usually have a high precision in evaluating a pure k-order SNP interaction (in which the k SNPs jointly affect complex diseases, and the number of SNPs is the same); however, they are ineffective for determining the association difference of SNP combinations that contain only some of the disease-causing SNPs.…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, the expressions of ∈ (1, ∞). Recent studies that include simulations based on epistasis models to generate their evaluation data [20][21][22] settle on low-order models whose heritability values are worryingly moderate. However, real-world diseases are usually determined by a higher number of genes [1] and a higher heritability [23,24].…”
Section: Model Restrictions and Existing Epistasis Modelsmentioning
confidence: 99%