Abdelkader Baggag scite author profile

RNA post-transcriptional modifications play a crucial role in a myriad of biological processes and cellular functions. To date, more than 160 RNA modifications have been discovered; therefore, accurate identification of RNA-modification sites is fundamental for a better understanding of RNA-mediated biological functions and mechanisms. However, due to limitations in experimental methods, systematic identification of different types of RNA-modification sites remains a major challenge. Recently, more than 20 computational methods have been developed to identify RNA-modification sites in tandem with high-throughput experimental methods, with most of these capable of predicting only single types of RNA-modification sites. These methods show high diversity in their dataset size, data quality, core algorithms, features extracted and feature selection techniques and evaluation strategies. Therefore, there is an urgent need to revisit these methods and summarize their methodologies, in order to improve and further develop computational techniques to identify and characterize RNA-modification sites from the large amounts of sequence data. With this goal in mind, first, we provide a comprehensive survey on a large collection of 27 state-of-the-art approaches for predicting N1-methyladenosine and N6-methyladenosine sites. We cover a variety of important aspects that are crucial for the development of successful predictors, including the dataset quality, operating algorithms, sequence and genomic features, feature selection, model performance evaluation and software utility. In addition, we also provide our thoughts on potential strategies to improve the model performance. Second, we propose a computational approach called DeepPromise based on deep learning techniques for simultaneous prediction of N1-methyladenosine and N6-methyladenosine. To extract the sequence context surrounding the modification sites, three feature encodings, including enhanced nucleic acid composition, one-hot encoding, and RNA embedding, were used as the input to seven consecutive layers of convolutional neural networks (CNNs), respectively. Moreover, DeepPromise further combined the prediction score of the CNN-based models and achieved around 43% higher area under receiver-operating curve (AUROC) for m1A site prediction and 2–6% higher AUROC for m6A site prediction, respectively, when compared with several existing state-of-the-art approaches on the independent test. In-depth analyses of characteristic sequence motifs identified from the convolution-layer filters indicated that nucleotide presentation at proximal positions surrounding the modification sites contributed most to the classification, whereas those at distal positions also affected classification but to different extents. To maximize user convenience, a web server was developed as an implementation of DeepPromise and made publicly available at http://DeepPromise.erc.monash.edu/, with the server accepting both RNA sequences and genomic sequences to allow prediction of two types of putative RNA-modification sites.

show abstract

Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework

Chen

et al. 2020

View full text Add to dashboard Cite

Promoters are short consensus sequences of DNA, which are responsible for transcription activation or the repression of all genes. There are many types of promoters in bacteria with important roles in initiating gene transcription. Therefore, solving promoter-identification problems has important implications for improving the understanding of their functions. To this end, computational methods targeting promoter classification have been established; however, their performance remains unsatisfactory. In this study, we present a novel stacked-ensemble approach (termed SELECTOR) for identifying both promoters and their respective classification. SELECTOR combined the composition of k-spaced nucleic acid pairs, parallel correlation pseudo-dinucleotide composition, position-specific trinucleotide propensity based on single-strand, and DNA strand features and using five popular tree-based ensemble learning algorithms to build a stacked model. Both 5-fold cross-validation tests using benchmark datasets and independent tests using the newly collected independent test dataset showed that SELECTOR outperformed state-of-the-art methods in both general and specific types of promoter prediction in Escherichia coli. Furthermore, this novel framework provides essential interpretations that aid understanding of model success by leveraging the powerful Shapley Additive exPlanation algorithm, thereby highlighting the most important features relevant for predicting both general and specific types of promoters and overcoming the limitations of existing ‘Black-box’ approaches that are unable to reveal causal relationships from large amounts of initially encoded features.

show abstract

Parallel Implementation of the Discontinuous Galerkin Method

Baggag

Atkins

Keyes

2000

View full text Add to dashboard Cite

This paper describes a parallel implementation of the discontinuous Galerkin method. The discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness on non-smooth unstructured grids and is well suited for time dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and symmetric of the approaches has been implemented in an object-oriented code used to simulate aeroacoustic scattering. The parallel implementation is MPI-based and has been tested on various parallel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problem due to cache effects. MotivationThe discontinuous Galerkin (DG) method is a robust and compact finite element projection method that provides a practical framework for the development of high-order accurate methods using unstructured grids. The method is well suited for large-scale time-dependent computations in which high accuracy is required. An important distinction between the DG method and the usual finite-element method is that in the DG method the resulting equations are local to the generating element. The solution within each element is not reconstructed by looking to neighboring elements. Thus, each element may be thought of as a separate entity that merely needs to obtain some boundary data from its neighbors. The compact form of the DG method makes it well suited for parallel computer platforms. This compactness also allows a heterogeneous treatment of problems. That is, the element topology, the degree of approximation and even the choice *

show abstract

PROSPECT: A web server for predicting protein histidine phosphorylation sites

Chen

Zhao

et al. 2020

J. Bioinform. Comput. Biol.

View full text Add to dashboard Cite

Background: Phosphorylation of histidine residues plays crucial roles in signaling pathways and cell metabolism in prokaryotes such as bacteria. While evidence has emerged that protein histidine phosphorylation also occurs in more complex organisms, its role in mammalian cells has remained largely uncharted. Thus, it is highly desirable to develop computational tools that are able to identify histidine phosphorylation sites. Result: Here, we introduce PROSPECT that enables fast and accurate prediction of proteome-wide histidine phosphorylation substrates and sites. Our tool is based on a hybrid method that integrates the outputs of two convolutional neural network (CNN)-based classifiers and a random forest-based classifier. Three features, including the one-of-K coding, enhanced grouped amino acids content (EGAAC) and composition of k-spaced amino acid group pairs (CKSAAGP) encoding, were taken as the input to three classifiers, respectively. Our results show that it is able to accurately predict histidine phosphorylation sites from sequence information. Our PROSPECT web server is user-friendly and publicly available at http://PROSPECT.erc.monash.edu/ . Conclusions: PROSPECT is superior than other pHis predictors in both the running speed and prediction accuracy and we anticipate that the PROSPECT webserver will become a popular tool for identifying the pHis sites in bacteria.

show abstract

Resilience analytics: coverage and robustness in multi-modal transportation networks

2018

View full text Add to dashboard Cite

A multi-modal transportation system of a city can be modeled as a multiplex network with different layers corresponding to different transportation modes. These layers include, but are not limited to, bus network, metro network, and road network. Formally, a multiplex network is a multilayer graph in which the same set of nodes are connected by different types of relationships. Intra-layer relationships denote the road segments connecting stations of the same transportation mode, whereas inter-layer relationships represent connections between different transportation modes within the same station. Given a multi-modal transportation system of a city, we are interested in assessing its quality or efficiency by estimating the coverage i.e., a portion of the city that can be covered by a random walker who navigates through it within a given time budget, or steps. We are also interested in the robustness of the whole transportation system which denotes the degree to which the system is able to withstand a random or targeted failure affecting one or more parts of it. Previous approaches proposed a mathematical framework to numerically compute the coverage in multiplex networks. However solutions are usually based on eigenvalue decomposition, known to be time consuming and hard to obtain in the case of large systems. In this work, we propose MUME, an efficient algorithm for Multi-modal Urban Mobility Estimation, that takes advantage of the special structure of the supra-Laplacian matrix of the transportation multiplex, to compute the coverage of the system. We conduct a comprehensive series of experiments to demonstrate the effectiveness and efficiency of MUME on both synthetic and real transportation networks of various cities such as Paris, London, New York and Chicago. A future goal is to use this experience to make projections for a fast growing city like Doha.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.