Summary
For understanding complex diseases, gene-environment (G-E) interactions have important implications beyond main G and E effects. Most of the existing analysis approaches and software packages cannot accommodate data contamination/long-tailed distribution. We develop GEInter, a comprehensive R package tailored to robust G-E interaction analysis. For both marginal and joint analysis, for data without and with missingness, for continuous and censored survival responses, it comprehensively conducts identification, estimation, visualization, and prediction. It can fill an important gap in the existing literature and enjoy broad applicability.
Availability and implementation
https://cran.r-project.org/web/packages/GEInter/.
Supplementary information
Supplementary data are available at Bioinformatics online.
Increasing evidence has shown that gene-gene interactions have important effects on biological processes of human diseases. Due to the high dimensionality of genetic measurements, existing interaction analysis methods usually suffer from a lack of sufficient information and are still unsatisfactory. Biological networks have been massively accumulated, allowing researchers to identify biomarkers from a system perspective by utilizing network selection (consisting of functionally related biomarkers) as well as network structures. In the main-effect analysis, network information has been widely incorporated, leading to biologically more meaningful and more accurate estimates. However, there is still a big gap in the context of interaction analysis. In this study, we develop a novel structured Bayesian interaction analysis approach, effectively incorporating the network information. This study is among the first to identify gene-gene interactions with the assistance of network selection for phenotype prediction, while simultaneously accommodating the underlying network structures. It innovatively respects the multiple hierarchies among main effects, interactions, and networks. Bayesian method is adopted, which has been shown to have multiple advantages over some other techniques. An efficient variational inference algorithm is developed to explore the posterior distribution. Extensive simulation studies demonstrate the practical superiority of the proposed approach. The analysis of TCGA data on melanoma and lung cancer leads to biologically sensible findings with satisfactory prediction accuracy and selection stability.
Genetic interactions play an important role in the progression of complex diseases, providing explanation of variations in disease phenotype missed by main genetic effects. Comparatively, there are fewer studies on survival time, given its challenging characteristics such as censoring. In recent biomedical research, two‐level analysis of both genes and their involved pathways has received much attention and been demonstrated as more effective than single‐level analysis. However, such analysis is usually limited to main effects. Pathways are not isolated, and their interactions have also been suggested to have important contributions to the prognosis of complex diseases. In this paper, we develop a novel two‐level Bayesian interaction analysis approach for survival data. This approach is the first to conduct the analysis of lower‐level gene–gene interactions and higher‐level pathway–pathway interactions simultaneously. Significantly advancing from the existing Bayesian studies based on the Markov Chain Monte Carlo (MCMC) technique, we propose a variational inference framework based on the accelerated failure time model with effective priors to accommodate two‐level selection as well as censoring. Its computational efficiency is much desirable for high‐dimensional interaction analysis. We examine performance of the proposed approach using extensive simulation. The application to TCGA melanoma and lung adenocarcinoma data leads to biologically sensible findings with satisfactory prediction accuracy and selection stability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.