Many deep learning approaches have been proposed to connect DNA sequence, epigenetic profiles, chromatin organization and transcription activities. While these approaches achieve satisfactory performance in predicting one modality from another, the representations learned are not generalizable across predictive tasks or across cell types. In this paper, we propose a deep learning approach named EPCOT which employs a pre-training and fine-tuning framework, and comprehensively predicts epigenome, chromatin organization, transcriptome, and enhancer activity in one framework, which is also generalizable to new cell types. EPCOT not only achieves superior predictive performance in individual predictive tasks, it also produces globally optimized sequence representations that are generalizable across different predictive tasks. Interpreting EPCOT model also allows us to provide a number of tools and services to the research community including mapping between different genomic modalities, identifying TF sequence binding patterns, and analyzing cell-type specific TF impacts to enhancer activity.
Many deep learning approaches have been proposed to predict epigenetic profiles, chromatin organization, and transcription activity. While these approaches achieve satisfactory performance in predicting one modality from another, the learned representations are not generalizable across predictive tasks or across cell types. In this paper, we propose a deep learning approach named EPCOT which employs a pre-training and fine-tuning framework, and is able to accurately and comprehensively predict multiple modalities including epigenome, chromatin organization, transcriptome, and enhancer activity for new cell types, by only requiring cell-type specific chromatin accessibility profiles. Many of these predicted modalities, such as Micro-C and ChIA-PET, are quite expensive to get in practice, and the in silico prediction from EPCOT should be quite helpful. Furthermore, this pre-training and fine-tuning framework allows EPCOT to identify generic representations generalizable across different predictive tasks. Interpreting EPCOT models also provides biological insights including mapping between different genomic modalities, identifying TF sequence binding patterns, and analyzing cell-type specific TF impacts on enhancer activity.
Quality assurance techniques are increasingly demanded in additive manufacturing. Going beyond most of the existing research that focuses on the melt pool temperature monitoring, we develop a new method that monitors the in-situ optical emission spectra signals. Optical emission spectra signals have been showing a potential capability of detecting microscopic pores. The concept is to extract features from the optical emission spectra via deep auto-encoders, and then cluster the features into two quality groups to consider both unlabelled and labelled samples in a semi-supervised manner. The method is integrated with multitask learning to make it adaptable for the samples collected from multiple processes. Both a simulation example and a case study are performed to demonstrate the effectiveness of the proposed method.
Human epigenome and transcription activities have been characterized by a number of sequence-based deep learning approaches which only utilize the DNA sequences. However, transcription factors interact with each other, and their collaborative regulatory activities go beyond the linear DNA sequence. Therefore leveraging the informative 3D chromatin organization to investigate the collaborations among transcription factors is critical. We developed ECHO, a graph-based neural network, to predict chromatin features and characterize the collaboration among them by incorporating 3D chromatin organization from 200-bp high-resolution Micro-C contact maps. ECHO predicted 2,583 chromatin features with significantly higher average AUROC and AUPR than the best sequence-based model. We observed that chromatin contacts of different distances affected different types of chromatin features’ prediction in diverse ways, suggesting complex and divergent collaborative regulatory mechanisms. Moreover, ECHO was interpretable via gradient-based attribution methods. The attributions on chromatin contacts identify important contacts relevant to chromatin features. The attributions on DNA sequences identify TF binding motifs and TF collaborative binding. Furthermore, combining the attributions on contacts and sequences reveals important sequence patterns in the neighborhood which are relevant to a target sequence’s chromatin feature prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.