Investigating how chromatin organization determines cell-type-specific gene expression remains challenging. Experimental methods for measuring three-dimensional chromatin organization, such as Hi-C, are costly and have technical limitations, restricting their broad application particularly in high-throughput genetic perturbations. We present C.Origami, a multimodal deep neural network that performs de novo prediction of cell-type-specific chromatin organization using DNA sequence and two cell-type-specific genomic features—CTCF binding and chromatin accessibility. C.Origami enables in silico experiments to examine the impact of genetic changes on chromatin interactions. We further developed an in silico genetic screening approach to assess how individual DNA elements may contribute to chromatin organization and to identify putative cell-type-specific trans-acting regulators that collectively determine chromatin architecture. Applying this approach to leukemia cells and normal T cells, we demonstrate that cell-type-specific in silico genetic screening, enabled by C.Origami, can be used to systematically discover novel chromatin regulation circuits in both normal and disease-related biological systems.
Early-stage lung adenocarcinoma is typically treated by surgical resection of the tumor. While in the majority of cases surgery can lead to cure, approximately 30% of patients progress. Despite intense efforts to map the genetic landscape of early-stage lung tumors, there has been limited success in discovering accurate biomarkers that can predict clinical outcomes. Meanwhile, the role of the tumor-adjacent tissue in cancer progression has been largely ignored. To test whether tumor-adjacent tissue can be informative of progression-free survival and to probe the underlying molecular pathways involved, we designed a multi-omic study in both tumor and matched tumor-adjacent histologically normal lung tissue from the same patient. Our study includes 143 treatment naive stage I cases with long-term patient follow-up and is, to our knowledge, the largest such study with the longest follow-up. We performed a comprehensive histologic characterization of all tumors, mapped the mutational landscape and probed the transcriptome of both tumor and adjacent normal tissue. We evaluated the predictive power of each data modality and showed that the transcriptome of tumor-adjacent histologically normal lung tissue is the only reliable predictor of clinical outcome. Unbiased discovery of co-expressed gene modules revealed that inflammatory pathways are upregulated in the tumor-adjacent tissue of patients at high risk for disease progression. Furthermore, single-cell transcriptome analysis in the tumor-adjacent lung demonstrated that progression-associated inflammatory signatures were broadly expressed by both immune and non-immune cells including mesothelial cells, alveolar type 2 cells and fibroblasts, CD1 dendritic cells and MAST cells. Collectively, our studies suggest that molecular profiling of tumor-adjacent tissue can identify patients that are at high risk for disease progression.
Traditional metagenome binning methods cluster contiguous DNA sequences (contigs) based on uncontextualized features of the sequences which ignores both the semantic relationship between genes and the positional embedding of k-mers. This thesis presents a novel binning method that addresses these concerns. Firstly, taken from natural language processing literature, a sequence representation model - Bidirectional Encoder Representations from Transformers(BERT) - is utilized to generate semantic and positional contig embeddings. Secondly, two workflows are presented; one which applies a hierarchical density-based clustering algorithm to find metagenomic bins and the other which incorporates contig embedding into a state-of-the-art binner. Experimental results on a publicly available metagenomic dataset show superior clustering for shorter contigs compared to traditionally used tetranucleotide frequency (TNF),reconstruction of up to 17% more high-precision genomes, and improved semantic understanding of contigs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.