Here, we describe a web server that integrates structural alignments with the MAFFT multiple sequence alignment (MSA) tool. For this purpose, we have prepared a web-based Database of Aligned Structural Homologs (DASH), which provides structural alignments at the domain and chain levels for all proteins in the Protein Data Bank (PDB), and can be queried interactively or by a simple REST-like API. MAFFT-DASH integration can be invoked with a single flag on either the web (
https://mafft.cbrc.jp/alignment/server/
) or command-line versions of MAFFT. In our benchmarks using 878 cases from the BAliBase, HomFam, OXFam, Mattbench and SISYPHUS datasets, MAFFT-DASH showed 10–20% improvement over standard MAFFT for MSA problems with weak similarity, in terms of Sum-of-Pairs (SP), a measure of how well a program succeeds at aligning input sequences in comparison to a reference alignment. When MAFFT alignments were supplemented with homologous sequences, further improvement was observed. Potential applications of DASH beyond MSA enrichment include functional annotation through detection of remote homology and assembly of template libraries for homology modeling.
Increasing awareness of the importance of protein–RNA interactions has motivated many approaches to predict residue-level RNA binding sites in proteins based on sequence or structural characteristics. Sequence-based predictors are usually high in sensitivity but low in specificity; conversely structure-based predictors tend to have high specificity, but lower sensitivity. Here we quantified the contribution of both sequence- and structure-based features as indicators of RNA-binding propensity using a machine-learning approach. In order to capture structural information for proteins without a known structure, we used homology modeling to extract the relevant structural features. Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers. These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions. We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.