A support vector machine approach for detecting gene‐gene interaction

Chen, Shyh‐Huei; Sun, Jielin; Dimitrov, Latchezar; Turner, Aubrey R.; Adams, Tamara S.; Meyers, Deborah A.; Chang, Bao‐Li; Zheng, S. Lilly; Grönberg, Henrik; Xu, Jianfeng; Hsu, Fang‐Chi

doi:10.1002/gepi.20272

Cited by 100 publications

(86 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…RA and other nominal data methods are inherently more appropriate to studying genomic data than other approaches such as neural nets or support vector machines (Chen et al, 2008) that presuppose metric information. The predictive relation in an RA (or LL, LR, or BN) model is precisely the conditional probability of the discrete output, given the discrete inputs.…”

Section: Discussionmentioning

confidence: 99%

Reconstructability Analysis of Epistasis

Zwick

2010

Annals of Human Genetics

View full text Add to dashboard Cite

SummaryThe literature on epistasis describes various methods to detect epistatic interactions and to classify different types of epistasis. Reconstructability analysis (RA) has recently been used to detect epistasis in genomic data. This paper shows that RA offers a classification of types of epistasis at three levels of resolution (variable-based models without loops, variable-based models with loops, state-based models). These types can be defined by the simplest RA structures that model the data without information loss; a more detailed classification can be defined by the information content of multiple candidate structures. The RA classification can be augmented with structures from related graphical modeling approaches. RA can analyze epistatic interactions involving an arbitrary number of genes or SNPs and constitutes a flexible and effective methodology for genomic analysis.

show abstract

Section: Discussionmentioning

confidence: 99%

Reconstructability Analysis of Epistasis

Zwick

2010

Annals of Human Genetics

View full text Add to dashboard Cite

show abstract

“…A total of 13 SNPs associated with obesity and T2D related traits, and prostate cancer are used as inputs for our prediction. Seven machine learning models simulated for the prediction of obesity were used, including: gradient boosting [43], generalised linear model [44], classification trees [45], k-nearest neighbours (KNN) [46], support vector machine (SVM) [47], random forest (RF) [48] and multilayer perceptron (MLP) neural network [49] trained using backpropagation.…”

Section: Methodsmentioning

confidence: 99%

Machine learning approaches for the prediction of obesity using publicly available genetic profiles

Montañez

Fergus

Hussain

et al. 2017

2017 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

Abstract-This paper presents a novel approach based on the analysis of genetic variants from publicly available genetic profiles and the manually curated database, the National Human Genome Research Institute Catalog. Using data science techniques, genetic variants are identified in the collected participant profiles and then indexed as risk variants in the National Human Genome Research Institute Catalog. Indexed genetic variants or Single Nucleotide Polymorphisms are used as inputs in various machine learning algorithms for the prediction of obesity. Body mass index status of participants is divided into two classes, Normal Class and Risk Class. Dimensionality reduction tasks are performed to generate a set of principal variables -13 SNPs -for the application of various machine learning methods. The models are evaluated using receiver operator characteristic curves and the area under the curve. Machine learning techniques including gradient boosting, generalized linear model, classification and regression trees, k-nearest neighbours, support vector machines, random forest and multilayer perceptron neural network are comparatively assessed in terms of their ability to identify the most important factors among the initial 6622 variables describing genetic variants, age and gender, to classify a subject into one of the body mass index related classes defined in this study. Our simulation results indicated that support vector machine generated the highest area under the curve value of 90.5%.

show abstract

“…(They can also be applied to quantitative or ordinal variables by binning.) By contrast, certain other machine learning methods such as neural nets [2], [28] or support vector machines [29], presuppose metric information and are thus less inherently suited for genomic analyses.…”

Section: Reconstructability Analysismentioning

confidence: 99%