The quantitative characterization of the transcriptional control by histone modifications has been challenged by many computational studies, but most of them only focus on narrow and linear genomic regions around promoters, leaving a room for improvement. We present Chromoformer, a transformer-based, three-dimensional chromatin conformation-aware deep learning architecture that achieves the state-of-the-art performance in the quantitative deciphering of the histone codes in gene regulation. The core essence of Chromoformer architecture lies in the three variants of attention operation, each specialized to model individual hierarchy of transcriptional regulation involving from core promoters to distal elements in contact with promoters through three-dimensional chromatin interactions. In-depth interpretation of Chromoformer reveals that it adaptively utilizes the long-range dependencies between histone modifications associated with transcription initiation and elongation. We also show that the quantitative kinetics of transcription factories and Polycomb group bodies can be captured by Chromoformer. Together, our study highlights the great advantage of attention-based deep modeling of complex interactions in epigenomes.
Phased DNA methylation states within bisulfite sequencing reads are valuable source of information that can be used to estimate epigenetic diversity across cells as well as epigenomic instability in individual cells. Various measures capturing the heterogeneity of DNA methylation states have been proposed for a decade. However, in routine analyses on DNA methylation, this heterogeneity is often ignored by computing average methylation levels at CpG sites, even though such information exists in bisulfite sequencing data in the form of phased methylation states, or methylation patterns. In this study, to facilitate the application of the DNA methylation heterogeneity measures in downstream epigenomic analyses, we present a Rust-based, extremely fast and lightweight bioinformatics toolkit called Metheor. As the analysis of DNA methylation heterogeneity requires the examination of pairs or groups of CpGs throughout the genome, existing softwares suffer from high computational burden, which almost make a large-scale DNA methylation heterogeneity studies intractable for researchers with limited resources. In this study, we benchmark the performance of Metheor against existing code implementations for DNA methylation heterogeneity measures in three different scenarios of simulated bisulfite sequencing datasets. Metheor was shown to dramatically reduce the execution time up to 300-fold and memory footprint up to 60-fold, while producing identical results with the original implementation, thereby facilitating a large-scale study of DNA methylation heterogeneity profiles. To demonstrate the utility of the low computational burden of Metheor, we show that the methylation heterogeneity profiles of 928 cancer cell lines can be computed with standard computing resources. With those profiles, we reveal the association between DNA methylation heterogeneity and various omics features. Source code for Metheor is at https://github.com/dohlee/metheor and is freely available under the GPL-3.0 license.
Phased DNA methylation states within bisulfite sequencing reads are valuable source of information that can be used to estimate epigenetic diversity across cells as well as epigenomic instability in individual cells. Various measures capturing the heterogeneity of DNA methylation states have been proposed for a decade. However, in routine analyses on DNA methylation, this heterogeneity is often overlooked by computing average methylation levels at CpG sites. In this study, to facilitate the application of the DNA methylation heterogeneity measures in downstream epigenomic analyses, we present a Rust-based, extremely fast and lightweight bioinformatics toolkit called Metheor. Here, we benchmark the performance of Metheor against existing code implementation for DNA methylation heterogeneity measures in three different scenarios of simulated bisulfite sequencing datasets. Metheor was shown to dramatically reduce the execution time up to 300-fold and memory footprint up to 60-fold, while producing the same results with the original implementation. Source code for Metheor is at https://github.com/dohlee/metheor and is freely available for non-commercial users.
The quantitative characterization of the transcriptional control by histone modifications (HMs) has been challenged by many computational studies, but still most of them exploit only partial aspects of intricate mechanisms involved in gene regulation, leaving a room for improvement. We present Chromoformer, a new transformer-based deep learning architecture that achieves the state-of-the-art performance in the quantitative deciphering of the histone codes of gene regulation. The core essence of Chromoformer architecture lies in the three variants of attention operation, each specialized to model individual hierarchy of three-dimensional (3D) transcriptional regulation including (1) histone codes at core promoters, (2) pairwise interaction between a core promoter and a distal cis-regulatory element mediated by 3D chromatin interactions, and (3) the collective effect of the pairwise cis-regulations. In-depth interpretation of the trained model behavior based on attention scores suggests that Chromoformer adaptively exploits the distant dependencies between HMs associated with transcription initiation and elongation. We also demonstrate that the quantitative kinetics of transcription factories and polycomb group bodies, in which the coordinated gene regulation occurs through spatial sequestration of genes with regulatory elements, can be captured by Chromoformer. Together, our study shows the great power of attention-based deep learning as a versatile modeling approach for the complex epigenetic landscape of gene regulation and highlights its potential as an effective toolkit that facilitates scientific discoveries in computational epigenetics.
Three-dimensional (3D) genome states are closely related to cancer development. Nonetheless, the 3D genome information has not been clinically utilized to the best of our knowledge, due to the costly production of Hi-C data which is a manifest source of 3D genome information. Therefore, there is a need for a novel metric computable from a 3D genome-related data which is more easily accessible for the clinical utilization of 3D genome information. We here propose a method to extract 3D genome-aware epigenetic features from DNA methylation data and use these features for a deep learning-based survival prediction. These features are derived from the 3D genome structures which are rebuilt from the DNA methylation data in an individual level. The results showed that usage of 3D genome-aware features contributed to more accurate risk prediction across seven cancer types, suggesting the effectiveness of the knowledge about 3D genome structure embedded in these features. The deeper biological investigation revealed that altered DNA methylation level in risk-high group could be related to the anomalously activated genes involved in cancer-related pathways. Altogether, the risks predicted from 3D genome-aware epigenetic features showed its significance as a survival predictor in seven cancer types, along with its biological importance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.