We present an algorithm, STREAM, for enhancer-driven gene regulatory network (eGRN) inference from transcriptome and chromatin accessibility profiled in the same cells. The algorithm improves the prediction accuracy of relations among transcription factors (TFs), enhancers, and genes by two new ideas: (i) a Steiner forest problem model to identify the set of highly confident enhancer-gene relations underlying a context-specific functional gene module; and (ii) a hybrid biclustering pipeline integrated with submodular optimization for inferring eGRNs by identifying the optimal set of hybrid biclusters, each of which represents co-regulated genes by the same TF and co-accessible enhancers bound by the same TF over a cell subpopulation. These two ideas are embedded in an iterative framework by finding patterns in a pair of transcriptome and chromatin accessibility matrices. Benchmarking analysis shows that the performance, assessed by f-scores, precision, or recall, was significantly improved by our program compared to four state-of-the-art tools on nine datasets. Besides, by implementing STREAM on an Alzheimer's disease dataset, we identified TF-enhancer-gene relations associated with pseudotime and investigated the changing of enhancer-gene relations alongside cell lineages. Additionally, by implementing STREAM on a diffuse small lymphocytic lymphoma dataset, we excavated key TF-enhancer-gene relations and TF cooperation underlying tumor cells.
Sequence motif discovery algorithms identify novel DNA patterns with significant biological roles, such as transcription factor (TF) binding site motifs. Chromatin accessibility data, accumulated through assay for transposase-accessible chromatin with sequencing (ATAC-seq), has enriched resources for motif discovery. However, computational efforts in ATAC-seq data analysis mainly target TF binding activity footprinting rather than motif prediction. Here, we introduce CEMIG, an algorithm predicting and characterizing TF binding sites, leveraging the De Bruijn and Hamming distance graph models. Evaluation of 129 ATAC-seq datasets from the Cistrome Data Browser suggests that CEMIG outperforms three widely used methods using four metrics. It is noteworthy that CEMIG is employed to predict cell-type-specific and shared TF motifs in GM12878 and K562 cells, facilitating comprehensive gene expression and functional genomics analysis.
Identifying precise transcription factor binding sites (TFBS) or regulatory DNA motifs plays a fundamental role in researching transcriptional regulatory mechanisms in cells and in helping construct regulatory networks. Current algorithms developed for motif searching focus on the analysis of ChIP-enriched peaks but are not able to integrate the ChIP signal in nucleotide resolution. We present a weighted two-stage alignment tool (TESA). Our framework implements an analysis workflow from experimental datasets to TFBS prediction results. It employs a binomial distribution model and graph searching model with ChIP-exonuclease (ChIP-exo) reads depth and sequence data. TESA can effectively measure the possibility for each position to be an actual TFBS in a given promoter sequence and predict statistically significant TFBS sequence segments. The algorithm substantially improves prediction accuracy and extends the scope of applicability of existing approaches. We apply the framework to a collection of 20 ChIP-exo datasets of E. coli from proChIPdb and evaluate the prediction performance through comparison with three existing programs. The performance evaluation against the compared programs indicates that TESA is more accurate for identifying regulatory motifs in prokaryotic genomes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.