Motivation Transcription factor binding sites (TFBSs) prediction is a crucial step in revealing functions of transcription factors (TFs) from high-throughput sequencing data. Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) provides insight on TFBSs and nucleosome positioning by probing open chromatic, which can simultaneously reveal multiple TFBSs compare to traditional technologies. The existing tools based on convolutional neural network (CNN) only find the fixed length of TFBSs from ATAC-seq data. Graph neural network (GNN) can be considered as the extension of CNN, which has great potential in finding multiple TFBSs with different lengths from ATAC-seq data. Results We develop a motif predictor called MMGraph based on three-layer GNN and coexisting probability of k-mers for finding multiple motifs from ATAC-seq data. The results of the experiment which has been conducted on 88 ATAC-seq datasets indicate that MMGraph has achieved the best performance on area of eight metrics radar (AEMR) score of 2.31 and could find 207 higher quality multiple motifs than other existing tools. Availability MMGraph is wrapped in Python package, which is available at https://github.com/zhangsq06/MMGraph.git Supplementary information Supplementary data are available at Bioinformatics online.
Glioma is the main type of malignant brain tumor in adults, and the status of isocitrate dehydrogenase (IDH) mutation highly affects the diagnosis, treatment, and prognosis of gliomas. Radiographic medical imaging provides a noninvasive platform for sampling both inter and intralesion heterogeneity of gliomas, and previous research has shown that the IDH genotype can be predicted from the fusion of multimodality radiology images. The features of medical images and IDH genotype are vital for medical treatment; however, it still lacks a multitask framework for the segmentation of the lesion areas of gliomas and the prediction of IDH genotype. In this paper, we propose a novel three-dimensional (3D) multitask deep learning model for segmentation and genotype prediction (SGPNet). The residual units are also introduced into the SGPNet that allows the output blocks to extract hierarchical features for different tasks and facilitate the information propagation. Our model reduces 26.6% classification error rates comparing with previous models on the datasets of Multimodal Brain Tumor Segmentation Challenge (BRATS) 2020 and The Cancer Genome Atlas (TCGA) gliomas’ databases. Furthermore, we first practically investigate the influence of lesion areas on the performance of IDH genotype prediction by setting different groups of learning targets. The experimental results indicate that the information of lesion areas is more important for the IDH genotype prediction. Our framework is effective and generalizable, which can serve as a highly automated tool to be applied in clinical decision making.
Promoters are DNA non-coding regions around the transcription start site and are responsible for regulating the gene transcription process. Due to their key role in gene function and transcriptional activity, the prediction of promoter sequences and their core elements accurately is a crucial research area in bioinformatics. At present, models based on machine learning and deep learning have been developed for promoter prediction. However, these models cannot mine the deeper biological information of promoter sequences and consider the complex relationship among promoter sequences. In this work, we propose a novel prediction model called PromGER to predict eukaryotic promoter sequences. For a promoter sequence, firstly, PromGER utilizes four types of feature-encoding methods to extract local information within promoter sequences. Secondly, according to the potential relationships among promoter sequences, the whole promoter sequences are constructed as a graph. Furthermore, three different scales of graph-embedding methods are applied for obtaining the global feature information more comprehensively in the graph. Finally, combining local features with global features of sequences, PromGER analyzes and predicts promoter sequences through a tree-based ensemble-learning framework. Compared with seven existing methods, PromGER improved the average specificity of 13%, accuracy of 10%, Matthew’s correlation coefficient of 16%, precision of 4%, F1 score of 6%, and AUC of 9%. Specifically, this study interpreted the PromGER by the t-distributed stochastic neighbor embedding (t-SNE) method and SHAPley Additive exPlanations (SHAP) value analysis, which demonstrates the interpretability of the model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.