BackgroundSparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often represented by graphs. Recent work has shown that incorporating such biological information improves feature selection and prediction performance in regression analysis, but there has been limited work on extending this approach to PCA. In this article, we propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior biological information in variable selection.ResultsOur simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods achieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to misspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are suggested in the literature to be related with glioblastoma.ConclusionsThe proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior biological information in variable selection, leading to improved feature selection and more interpretable principal component loadings and potentially providing insights on molecular underpinnings of complex diseases.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-017-1740-7) contains supplementary material, which is available to authorized users.
Summary Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this paper our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach.
Aims To test the hypothesis that a 50-g oral glucose challenge test with 1-h glucose measurement would have superior performance compared with other opportunistic screening methods. Methods In this prospective study in a Veterans Health Administration primary care clinic, the following test performances, measured by area under receiver-operating characteristic curves were compared: oral glucose challenge test; random glucose; and HbA1c level, using an oral glucose tolerance test as the ‘gold standard’. Results The study population comprised 1535 people (mean age 56 years, BMI 30.3 kg/m2, 94% men, 74% black). By oral glucose tolerance test criteria, diabetes was present in 10% and high-risk prediabetes was present in 22% of the cohort. The plasma glucose challenge test provided area under receiver-operating characteristic curves of 0.85 (95% CI 0.78–0.91) to detect diabetes and 0.76 (95% CI 0.72–0.80) to detect high-risk dysglycaemia (diabetes or high-risk prediabetes), while area under receiver-operating characteristic curves for the capillary glucose challenge test were 0.82 (95% CI 0.75–0.89) and 0.73 (95% CI 0.69–0.77) for diabetes and high-risk dysglycaemia, respectively. Random glucose performed less well [plasma: 0.76 (95% CI 0.69–0.82) and 0.66 (95% CI 0.62–0.71), respectively; capillary: 0.72 (95% CI 0.65–0.80) and 0.64 (95% CI 0.59–0.68), respectively] and HbA1c performed even less well [0.67 (95% CI 0.57–0.76) and 0.63 (95% CI 0.58–0.68), respectively]. The cost of identifying one case of high-risk dysglycaemia with a plasma glucose challenge test would be $42 from a Veterans Affairs perspective, and $55 from a US Medicare perspective. Conclusions Glucose challenge test screening, followed, if abnormal, by an oral glucose tolerance test, would be convenient and more accurate than other opportunistic tests. Use of glucose challenge test screening could improve management by permitting earlier therapy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.