The ongoing functional annotation of proteins relies upon the work of curators to capture experimental findings from scientific literature and apply them to protein sequence and structure data. However, with the increasing use of high-throughput experimental assays, a small number of experimental studies dominate the functional protein annotations collected in databases. Here, we investigate just how prevalent is the “few articles - many proteins” phenomenon. We examine the experimentally validated annotation of proteins provided by several groups in the GO Consortium, and show that the distribution of proteins per published study is exponential, with 0.14% of articles providing the source of annotations for 25% of the proteins in the UniProt-GOA compilation. Since each of the dominant articles describes the use of an assay that can find only one function or a small group of functions, this leads to substantial biases in what we know about the function of many proteins. Mass-spectrometry, microscopy and RNAi experiments dominate high throughput experiments. Consequently, the functional information derived from these experiments is mostly of the subcellular location of proteins, and of the participation of proteins in embryonic developmental pathways. For some organisms, the information provided by different studies overlap by a large amount. We also show that the information provided by high throughput experiments is less specific than those provided by low throughput experiments. Given the experimental techniques available, certain biases in protein function annotation due to high-throughput experiments are unavoidable. Knowing that these biases exist and understanding their characteristics and extent is important for database curators, developers of function annotation programs, and anyone who uses protein function annotation data to plan experiments.
The COVID-19 pandemic caused by the novel SARS-CoV-2 is more contagious than other coronaviruses and has higher rates of mortality than influenza. Identification of effective therapeutics is a crucial tool to treat those infected with SARS-CoV-2 and limit the spread of this novel disease globally. We deployed a bioinformatics workflow to identify candidate drugs for the treatment of COVID-19. Using an “omics” repository, the Library of Integrated Network-Based Cellular Signatures (LINCS), we simultaneously probed transcriptomic signatures of putative COVID-19 drugs and publicly available SARS-CoV-2 infected cell lines to identify novel therapeutics. We identified a shortlist of 20 candidate drugs: 8 are already under trial for the treatment of COVID-19, the remaining 12 have antiviral properties and 6 have antiviral efficacy against coronaviruses specifically, in vitro. All candidate drugs are either FDA approved or are under investigation. Our candidate drug findings are discordant with (i.e., reverse) SARS-CoV-2 transcriptome signatures generated in vitro, and a subset are also identified in transcriptome signatures generated from COVID-19 patient samples, like the MEK inhibitor selumetinib. Overall, our findings provide additional support for drugs that are already being explored as therapeutic agents for the treatment of COVID-19 and identify promising novel targets that are worthy of further investigation.
The COVID-19 pandemic caused by the novel SARS-CoV-2 is more contagious than other coronaviruses and has higher rates of mortality than influenza. As no vaccine or drugs are currently approved to specifically treat COVID-19, identification of effective therapeutics is crucial to treat the afflicted and limit disease spread. We deployed a bioinformatics workflow to identify candidate drugs for the treatment of COVID-19. Using an “omics” repository, the Library of Integrated Network-Based Cellular Signatures (LINCS), we simultaneously probed transcriptomic signatures of putative COVID-19 drugs and signatures of coronavirus-infected cell lines to identify therapeutics with concordant signatures and discordant signatures, respectively. Our findings include three FDA approved drugs that have established antiviral activity, including protein kinase inhibitors, providing a promising new category of candidates for COVID-19 interventions.
Experimental data about gene functions curated from the primary literature have enormous value for research scientists in understanding biology. Using the Gene Ontology (GO), manual curation by experts has provided an important resource for studying gene function, especially within model organisms. Unprecedented expansion of the scientific literature and validation of the predicted proteins have increased both data value and the challenges of keeping pace. Capturing literature-based functional annotations is limited by the ability of biocurators to handle the massive and rapidly growing scientific literature. Within the community-oriented wiki framework for GO annotation called the Gene Ontology Normal Usage Tracking System (GONUTS), we describe an approach to expand biocuration through crowdsourcing with undergraduates. This multiplies the number of high-quality annotations in international databases, enriches our coverage of the literature on normal gene function, and pushes the field in new directions. From an intercollegiate competition judged by experienced biocurators, Community Assessment of Community Annotation with Ontologies (CACAO), we have contributed nearly 5,000 literature-based annotations. Many of those annotations are to organisms not currently well-represented within GO. Over a 10-year history, our community contributors have spurred changes to the ontology not traditionally covered by professional biocurators. The CACAO principle of relying on community members to participate in and shape the future of biocuration in GO is a powerful and scalable model used to promote the scientific enterprise. It also provides undergraduate students with a unique and enriching introduction to critical reading of primary literature and acquisition of marketable skills.
The human immune system is responsible for identification and destruction of invader cells, such as the bacterial pathogen In response, brings to the fight a large number of virulence factors, including several that allow it to evade the host immune response. The staphylococcal surface protein SdrE was recently reported to bind to complement Factor H, an important regulator of complement activation. Factor H attaches to the surface of host cells to inhibit complement activation and amplification, preventing the destruction of the host cell. SdrE binding to Factor H allows to mimic a host cell and reduces bacterial killing by granulocytes. In a new study published in, Zhang et al. describe crystal structures of SdrE and its complex with the C-terminal portion of Factor H. The structure of SdrE and its interaction with the Factor H peptide closely resemble a family of surface proteins that recognize extracellular matrix components such as fibrinogen. However, unbound SdrE forms a novel 'Closed' conformation with an occluded peptide-binding groove. These structures reveal a fascinating mechanism for immune evasion and provide a potential avenue for the development of novel antimicrobial agents to target SdrE.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.