The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the ‘dark’ proteome.
There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs. Short abstract There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.
To assess the role of core metabolism genes in bacterial virulence - independently of their effect on growth - we correlated the genome, the transcriptome and the pathogenicity in flies and mice of 30 fully sequenced Pseudomonas strains. Gene presence correlates robustly with pathogenicity differences among all Pseudomonas species, but not among the P. aeruginosa strains. However, gene expression differences are evident between highly and lowly pathogenic P. aeruginosa strains in multiple virulence factors and a few metabolism genes. Moreover, 16.5%, a noticeable fraction of the core metabolism genes of P. aeruginosa strain PA14 (compared to 8.5% of the non-metabolic genes tested), appear necessary for full virulence when mutated. Most of these virulence-defective core metabolism mutants are compromised in at least one key virulence mechanism independently of auxotrophy. A pathway level analysis of PA14 core metabolism, uncovers beta-oxidation and the biosynthesis of amino-acids, succinate, citramalate, and chorismate to be important for full virulence. Strikingly, the relative expression among P. aeruginosa strains of genes belonging in these metabolic pathways is indicative of their pathogenicity. Thus, P. aeruginosa strain-to-strain virulence variation, remains largely obscure at the genome level, but can be dissected at the pathway level via functional transcriptomics of core metabolism.
Haemoglobinopathies are common monogenic disorders with diverse clinical manifestations, partly attributed to the influence of modifier genes. Recent years have seen enormous growth in the amount of genetic data, instigating the need for ranking methods to identify candidate genes with strong modifying effects. Here, we present the first evidence-based gene ranking metric (IthaScore) for haemoglobinopathy-specific phenotypes by utilising curated data in the IthaGenes database. IthaScore successfully reflects current knowledge for well-established disease modifiers, while it can be dynamically updated with emerging evidence. Protein–protein interaction (PPI) network analysis and functional enrichment analysis were employed to identify new potential disease modifiers and to evaluate the biological profiles of selected phenotypes. The most relevant gene ontology (GO) and pathway gene annotations for (a) haemoglobin (Hb) F levels/Hb F response to hydroxyurea included urea cycle, arginine metabolism and vascular endothelial growth factor receptor (VEGFR) signalling, (b) response to iron chelators included xenobiotic metabolism and glucuronidation, and (c) stroke included cytokine signalling and inflammatory reactions. Our findings demonstrate the capacity of IthaGenes, together with dynamic gene ranking, to expand knowledge on the genetic and molecular basis of phenotypic variation in haemoglobinopathies and to identify additional candidate genes to potentially inform and improve diagnosis, prognosis and therapeutic management.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.