BackgroundRapid computational and technological developments made large amounts of omics data available in different biological levels. It is becoming clear that simultaneous data analysis methods are needed for better interpretation and understanding of the underlying systems biology. Different methods have been proposed for this task, among them Partial Least Squares (PLS) related methods. To also deal with orthogonal variation, systematic variation in the data unrelated to one another, we consider the Two-way Orthogonal PLS (O2PLS): an integrative data analysis method which is capable of modeling systematic variation, while providing more parsimonious models aiding interpretation.ResultsA simulation study to assess the performance of O2PLS showed positive results in both low and higher dimensions. More noise (50 % of the data) only affected the systematic part estimates. A data analysis was conducted using data on metabolomics and transcriptomics from a large Finnish cohort (DILGOM). A previous sequential study, using the same data, showed significant correlations between the Lipo-Leukocyte (LL) module and lipoprotein metabolites. The O2PLS results were in agreement with these findings, identifying almost the same set of co-varying variables. Moreover, our integrative approach identified other associative genes and metabolites, while taking into account systematic variation in the data. Including orthogonal components enhanced overall fit, but the orthogonal variation was difficult to interpret.ConclusionsSimulations showed that the O2PLS estimates were close to the true parameters in both low and higher dimensions. In the presence of more noise (50 %), the orthogonal part estimates could not distinguish well between joint and unique variation. The joint estimates were not systematically affected. Simultaneous analysis with O2PLS on metabolome and transcriptome data showed that the LL module, together with VLDL and HDL metabolites, were important for the metabolomic and transcriptomic relation. This is in agreement with an earlier study. In addition more gene expression and metabolites are identified being important for the joint covariation.
Glycosylation is an abundant co-and post-translational protein modification of importance to protein processing and activity. Although not template-defined, glycosylation does reflect the biological state of an organism and is a high-potential biomarker for disease and patient stratification. However, to interpret a complex but informative sample like the total plasma N-glycome, it is important to establish its baseline association with plasma protein levels and systemic processes. Thus far, large-scale studies (n >200) of the total plasma N-glycome have been performed with methods of chromatographic and electrophoretic separation, which, although being informative, are limited in resolving the structural complexity of plasma N-glycans. MS has the opportunity to contribute additional information on, among others, antennarity, sialylation, and the identity of high-mannose type species.Here, we have used matrix-assisted laser desorption/ ionization ( Glycosylation is a ubiquitous co-and post-translational protein modification of functional relevance to the processing and activity of the conjugate. Examples include quality control during protein folding, regulation of circulatory half-life, and modulation of receptor interactions by either providing the recognition motif or by affecting protein conformation (1-7). Consequentially, glycosylation has been associated with a multitude of diseases and states thereof, among which the progression and metastasis of cancer and the remission of rheumatoid arthritis (8 -11). Because the process of glycosylation is not template-defined, glycosylation integrates a large series of cellular conditions such as glycosidase/glycosyltransferase abundance and activity, endoplasmic reticulum From the ‡Center
With a rapid increase in volume and complexity of data sets, there is a need for methods that can extract useful information, for example the relationship between two data sets measured for the same persons. The Partial Least Squares (PLS) method can be used for this dimension reduction task. Within life sciences, results across studies are compared and combined. Therefore, parameters need to be identifiable, which is not the case for PLS. In addition, PLS is an algorithm, while epidemiological study designs are often outcome-dependent and methods to analyze such data require a probabilistic formulation. Moreover, a probabilistic model provides a statistical framework for inference. To address these issues, we develop Probabilistic PLS (PPLS). We derive maximum likelihood estimators that satisfy the identifiability conditions by using an EM algorithm with a constrained optimization in the M step. We show that the PPLS parameters are identifiable up to sign. A simulation study is conducted to study the performance of PPLS compared to existing methods. The PPLS estimates performed well in various scenarios, even in high dimensions. Most notably, the estimates seem to be robust against departures from normality. To illustrate our method, we applied it to IgG glycan data from two cohorts. Our PPLS model provided insight as well as interpretable results across the two cohorts.
BackgroundWith the exponential growth in available biomedical data, there is a need for data integration methods that can extract information about relationships between the data sets. However, these data sets might have very different characteristics. For interpretable results, data-specific variation needs to be quantified. For this task, Two-way Orthogonal Partial Least Squares (O2PLS) has been proposed. To facilitate application and development of the methodology, free and open-source software is required. However, this is not the case with O2PLS.ResultsWe introduce OmicsPLS, an open-source implementation of the O2PLS method in R. It can handle both low- and high-dimensional datasets efficiently. Generic methods for inspecting and visualizing results are implemented. Both a standard and faster alternative cross-validation methods are available to determine the number of components. A simulation study shows good performance of OmicsPLS compared to alternatives, in terms of accuracy and CPU runtime. We demonstrate OmicsPLS by integrating genetic and glycomic data.ConclusionsWe propose the OmicsPLS R package: a free and open-source implementation of O2PLS for statistical data integration. OmicsPLS is available at https://cran.r-project.org/package=OmicsPLS and can be installed in R via install.packages(“OmicsPLS”).Electronic supplementary materialThe online version of this article (10.1186/s12859-018-2371-3) contains supplementary material, which is available to authorized users.
Background Hypertrophic cardiomyopathy (HCM) is the most common genetic disease of the cardiac muscle, frequently caused by mutations in MYBPC3. However, little is known about the upstream pathways and key regulators causing the disease. Therefore, we employed a multi-omics approach to study the pathomechanisms underlying HCM comparing patient hearts harboring MYBPC3 mutations to control hearts. Results Using H3K27ac ChIP-seq and RNA-seq we obtained 9310 differentially acetylated regions and 2033 differentially expressed genes, respectively, between 13 HCM and 10 control hearts. We obtained 441 differentially expressed proteins between 11 HCM and 8 control hearts using proteomics. By integrating multi-omics datasets, we identified a set of DNA regions and genes that differentiate HCM from control hearts and 53 protein-coding genes as the major contributors. This comprehensive analysis consistently points toward altered extracellular matrix formation, muscle contraction, and metabolism. Therefore, we studied enriched transcription factor (TF) binding motifs and identified 9 motif-encoded TFs, including KLF15, ETV4, AR, CLOCK, ETS2, GATA5, MEIS1, RXRA, and ZFX. Selected candidates were examined in stem cell-derived cardiomyocytes with and without mutated MYBPC3. Furthermore, we observed an abundance of acetylation signals and transcripts derived from cardiomyocytes compared to non-myocyte populations. Conclusions By integrating histone acetylome, transcriptome, and proteome profiles, we identified major effector genes and protein networks that drive the pathological changes in HCM with mutated MYBPC3. Our work identifies 38 highly affected protein-coding genes as potential plasma HCM biomarkers and 9 TFs as potential upstream regulators of these pathomechanisms that may serve as possible therapeutic targets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.