The increased availability of multi-view data (data on the same samples from multiple sources) has led to strong interest in models based on low-rank matrix factorizations. These models represent each data view via shared and individual components, and have been successfully applied for exploratory dimension reduction, association analysis between the views, and consensus clustering. Despite these advances, there remain challenges in modeling partially-shared components and identifying the number of components of each type (shared/ partially-shared/individual). We formulate a novel linked component model that directly incorporates partially-shared structures. We call this model SLIDE for Structural Learning and Integrative DEcomposition of multi-view data. The proposed model-fitting and selection techniques allow for joint identification of the number of components of each type, in contrast to existing sequential approaches. In our empirical studies, SLIDE demonstrates excellent performance in both signal estimation and component selection. We further illustrate the methodology on the breast cancer data from The Cancer Genome Atlas repository. K E Y W O R D Sdata integration, dimension reduction, multiblock methods, principal component analysis, structured sparsity
In modern biomedical research, it is ubiquitous to have multiple data sets measured on the same set of samples from different views (i.e., multi-view data). For example, in genetic studies, multiple genomic data sets at different molecular levels or from different cell types are measured for a common set of individuals to investigate genetic regulation. Integration and reduction of multi-view data have the potential to leverage information in different data sets, and to reduce the magnitude and complexity of data for further statistical analysis and interpretation. In this article, we develop a novel statistical model, called supervised integrated factor analysis (SIFA), for integrative dimension reduction of multi-view data while incorporating auxiliary covariates. The model decomposes data into joint and individual factors, capturing the joint variation across multiple data sets and the individual variation specific to each set, respectively. Moreover, both joint and individual factors are partially informed by auxiliary covariates via nonparametric models. We devise a computationally efficient Expectation-Maximization (EM) algorithm to fit the model under some identifiability conditions. We apply the method to the Genotype-Tissue Expression (GTEx) data, and provide new insights into the variation decomposition of gene expression in multiple tissues. Extensive simulation studies and an additional application to a pediatric growth study demonstrate the advantage of the proposed method over competing methods.
Dimension reduction of high-dimensional microbiome data facilitates subsequent analysis such as regression and clustering. Most existing reduction methods cannot fully accommodate the special features of the data such as count-valued and excessive zero reads. We propose a zero-inflated Poisson factor analysis (ZIPFA) model in this article. The model assumes that microbiome absolute abundance data follow zeroinflated Poisson distributions with library size as offset and Poisson rates negatively related to the inflated zero occurrences. The latent parameters of the model form a low-rank matrix consisting of interpretable loadings and low-dimensional scores which can be used for further analyses. We develop an efficient and robust expectationmaximization (EM) algorithm for parameter estimation. We demonstrate the efficacy of the proposed method using comprehensive simulation studies. The application to the Oral Infections, Glucose Intolerance and Insulin Resistance Study (ORIGINS) provides valuable insights into the relation between subgingival microbiome and periodontal disease.
Background: Despite tremendous efforts, the incidence of surgical site infection (SSI) following the surgical treatment of pediatric spinal deformity remains a concern. Although previous studies have reported some risk factors for SSI, these studies have been limited by not being able to investigate multiple risk factors at the same time. The aim of the present study was to evaluate a wide range of preoperative and intraoperative factors in predicting SSI and to develop and validate a prediction model that quantifies the risk of SSI for individual pediatric spinal deformity patients.Methods: Pediatric patients with spinal deformity who underwent primary, revision, or definitive spinal fusion at 1 of 7 institutions were included. Candidate predictors were known preoperatively and were not modifiable in most cases; these included 31 patient, 12 surgical, and 4 hospital factors. The Centers for Disease Control and Prevention definition of SSI within 90 days of surgery was utilized. Following multiple imputation and multicollinearity testing, predictor selection was conducted with use of logistic regression to develop multiple models. The data set was randomly split into training and testing sets, and fivefold cross-validation was performed to compare discrimination, calibration, and overfitting of each model and to determine the final model. A risk probability calculator and a mobile device application were developed from the model in order to calculate the probability of SSI in individual patients.Results: A total of 3,092 spinal deformity surgeries were included, in which there were 132 cases of SSI (4.3%). The final model achieved adequate discrimination (area under the receiver operating characteristic curve: 0.76), as well as calibration and no overfitting. Predictors included in the model were nonambulatory status, neuromuscular etiology, pelvic instrumentation, procedure time ‡7 hours, American Society of Anesthesiologists grade >2, revision procedure, hospital spine surgical cases <100/year, abnormal hemoglobin level, and overweight or obese body mass index. Conclusions:The risk probability calculator encompassing patient, surgical, and hospital factors developed in the present study predicts the probability of 90-day SSI in pediatric spinal deformity surgery. This validated calculator can be utilized to improve informed consent and shared decision-making and may allow the deployment of additional resources and strategies selectively in high-risk patients.
BackgroundExpression quantitative trait loci (eQTL) analysis identifies genetic markers associated with the expression of a gene. Most existing eQTL analyses and methods investigate association in a single, readily available tissue, such as blood. Joint analysis of eQTL in multiple tissues has the potential to improve, and expand the scope of, single-tissue analyses. Large-scale collaborative efforts such as the Genotype-Tissue Expression (GTEx) program are currently generating high quality data in a large number of tissues. However, computational constraints limit genome-wide multi-tissue eQTL analysis.ResultsWe develop an integrative method under a hierarchical Bayesian framework for eQTL analysis in a large number of tissues. The model fitting procedure is highly scalable, and the computing time is a polynomial function of the number of tissues. Multi-tissue eQTLs are identified through a local false discovery rate approach, which rigorously controls the false discovery rate. Using simulation and GTEx real data studies, we show that the proposed method has superior performance to existing methods in terms of computing time and the power of eQTL discovery.ConclusionsWe provide a scalable method for eQTL analysis in a large number of tissues. The method enables the identification of eQTL with different configurations and facilitates the characterization of tissue specificity.Electronic supplementary materialThe online version of this article (10.1186/s12859-018-2088-3) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.