BackgroundThe Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.ResultsHere, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.ConclusionWe conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Here we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility (P. aureginosa only). We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. We conclude that, while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. We finally report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bioontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens. 157 project. Predicting GO terms for a protein (protein-centric) and predicting which proteins are associated 158 with a given function (term-centric) are related but different computational problems: the former is a 159 multi-label classification problem with a structured output, while the latter is a binary classification task. 160Predicting the results of a genome-wide screen for a single or a small number of functions fits the term-centric 161 formulation. To see how well all participating CAFA methods perform term-centric predictions, we mapped 162 results from the protein-centric CAFA3 methods onto these terms. In addition we held a separate CAFA 163 challenge, CAFA-π whose purpose was to attract additional submissions from algorithms that specialize in 164 term-centric tasks. 165 We performed screens for three functions in three species, which we then used to assess protein function 166 prediction. In the bacterium Pseudomonas aeruginosa and the fungus Candida albicans we performed 167 genome-wide screens capable of uncovering genes with two functions, biofilm formation (GO:0042710) and 168 motility (for P. aeruginosa only) (GO:0001539), as described in Methods. In Drosophila melanogaster we 169 performed targeted assays, guided by previous CAFA submissions, of a ...
Pseudomonas aeruginosa has been amongst the top 10 'superbugs' worldwide and is causing infections with poor outcomes in both humans and animals. From 202 P. aeruginosa isolates (n = 121 animal and n = 81 human), 40 were selected on the basis of biofilm-forming ability and were comparatively characterized in terms of virulence determinants to the type strain P. aeruginosa PAO1. Biofilm formation, pyocyanin and hemolysin production, and bacterial motility patterns were compared with the ability to kill human cell line A549 in vitro. On average, there was no significant difference between levels of animal and human cytotoxicity, while human isolates produced higher amounts of pyocyanin, hemolysins and showed increased swimming ability. Non-parametric statistical analysis identified the highest positive correlation between hemolysis and the swarming ability. For the first time an ensemble machine learning approach used on the in vitro virulence data determined the highest relative predictive importance of the submerged biofilm formation for the cytotoxicity, as an indicator of the infection ability. The findings from the in vitro study were validated in vivo using zebrafish (Danio rerio) embryos. This study highlighted no major differences between P. aeruginosa species isolated from animal and human infections and the importance of pyocyanin production in cytotoxicity and infection ability.
Intrinsically disordered proteins (IDPs) are characterized by the lack of a fixed tertiary structure and are involved in the regulation of key biological processes via binding to multiple protein partners. IDPs are malleable, adapting to structurally different partners, and this flexibility stems from features encoded in the primary structure. The assumption that universal sequence information will facilitate coverage of the sparse zones of the human interactome motivated us to explore the possibility of predicting protein-protein interactions (PPIs) that involve IDPs based on sequence characteristics. We developed a method that relies on features of the interacting and non-interacting protein pairs and utilizes machine learning to classify and predict IDP PPIs. Consideration of both sequence determinants specific for conformational organizations and the multiplicity of IDP interactions in the training phase ensured a reliable approach that is superior to current state-of-the-art methods. By applying a strict evaluation procedure, we confirm that our method predicts interactions of the IDP of interest even on the proteome-scale. This service is provided as a web tool to expedite the discovery of new interactions and IDP functions with enhanced efficiency.
http://www.vin.bg.ac.rs/180/tools/tfpred.php CONTACT: vladaper@vinca.rs; nevenav@vinca.rsSupplementary information: Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.