Gene expression is a biological process regulated at different molecular levels, including chromatin accessibility, transcription, and RNA maturation and transport. In addition, these regulatory mechanisms have strong links with cellular metabolism. Here we present a multi-omics dataset that captures different aspects of this multi-layered process in yeast. We obtained RNA-seq, metabolomics, and H4K12ac ChIP-seq data for wild-type and mip6Δ strains during a heat-shock time course. Mip6 is an RNa-binding protein that contributes to RNa export during environmental stress and is informative of the contribution of post-transcriptional regulation to control cellular adaptations to environmental changes. The experiment was performed in quadruplicate, and the different omics measurements were obtained from the same biological samples, which facilitates the integration and analysis of data using covariance-based methods. We validate our dataset by showing that ChIP-seq, RNA-seq and metabolomics signals recapitulate existing knowledge about the response of ribosomal genes and the contribution of trehalose metabolism to heat stress. Raw data, processed data and preprocessing scripts are made available.
The Yeast Metabolic Cycle (YMC) is a model system in which levels of around 60% of the yeast transcripts cycle over time. The spatial and temporal resolution provided by the YMC has revealed that changes in the yeast metabolic landscape and chromatin status can be related to cycling gene expression. However, the interplay between histone modifications and transcription factor activity during the YMC is still poorly understood. Here we apply an innovative statistical approach to integrate chromatin state (ChIP-seq) and gene expression (RNA-seq) data to investigate the transcriptional control during the YMC. By using the multivariate regression models N-PLS (Partial Least Squares) and MORE (Multi-Omics REgulation) methodologies, we assessed the contribution of histone marks and transcription factors to the regulation of gene expression in the YMC. We found that H3K18ac and H3K9ac were the most important histone modifications, whereas Sfp1, Hfi1, Pip2, Mig2, and Yhp1 emerged as the most relevant transcription factors. A significant association in the co-regulation of gene expression was found between H3K18ac and the transcription factors Pip2 (involved in fatty acids metabolism), Xbp1 (cyclin implicated in the regulation of carbohydrate and amino acid metabolism), and Hfi1 (involved in the formation of the SAGA complex). These results evidence the crucial role of histone lysine acetylation levels in the regulation of gene expression in the YMC through the coordinated action of transcription factors and lysine acetyltransferases.
Diversity of omic technologies has expanded in the last years together with the number of omic data integration strategies. However, multiomic data generation is costly, and many research groups cannot afford research projects where many different omic techniques are generated, at least at the same time. As most researchers share their data in public repositories, different omic datasets of the same biological system obtained at different labs can be combined to construct a multiomic study. However, data obtained at different labs or moments in time are typically subjected to batch effects that need to be removed for successful data integration. While there are methods to correct batch effects on the same data types obtained in different studies, they cannot be applied to correct lab or batch effects across omics. This impairs multiomic meta-analysis. Fortunately, in many cases, at least one omics platform—i.e. gene expression— is repeatedly measured across labs, together with the additional omic modalities that are specific to each study. This creates an opportunity for batch analysis. We have developed MultiBaC (multiomic Multiomics Batch-effect Correction correction), a strategy to correct batch effects from multiomic datasets distributed across different labs or data acquisition events. Our strategy is based on the existence of at least one shared data type which allows data prediction across omics. We validate this approach both on simulated data and on a case where the multiomic design is fully shared by two labs, hence batch effect correction within the same omic modality using traditional methods can be compared with the MultiBaC correction across data types. Finally, we apply MultiBaC to a true multiomic data integration problem to show that we are able to improve the detection of meaningful biological effects.
Motivation Batch effects in omics datasets are usually a source of technical noise that masks the biological signal and hampers data analysis. Batch effect removal has been widely addressed for individual omics technologies. However, multi-omic datasets may combine data obtained in different batches where omics type and batch are often confounded. Moreover, systematic biases may be introduced without notice during data acquisition, which creates a hidden batch effect. Current methods fail to address batch effect correction in these cases. Results In this paper we introduce the MultiBaC R package, a tool for batch effect removal in multi-omics and hidden batch effect scenarios. The package includes a diversity of graphical outputs for model validation and assessment of the batch effect correction. Availability MultiBaC package is available on Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/MultiBaC.html) and GitHub (https://github.com/ConesaLab/MultiBaC.git). Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.