Over the last 25 years, biology has entered the genomic era and is becoming a science of ‘big data’. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3–4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.
Protein subcellular localization is one of the key characteristic to understand its biological function. Proteins are transported to specific organelles and suborganelles after they are synthesized. They take part in cell activity and function efficiently when correctly localized. Inaccurate subcellular localization will have great impact on cellular function. Prediction of protein subcellular localization is one of the important areas in protein function research. Now it becomes the hot issue in bioinformatics. In this review paper, the recent progress on bioinformatics research of protein subcellular localization and its prospect are discussed.
Orthology prediction is challenging yet rewarding. Orthologs lay the cornerstone of almost all comparative genomics studies. Dozens of ortholog resources have been available and broadly used over the past decades. However, the inconsistency between these resources has drawn growing concerns, especially when more proteomes are available and ortholog databases expand. It is no longer easy to decide which ortholog database to use and compare conclusions based on different resources. We are presenting here a metric to assess ortholog functional consistency. Using this metric, we built a network connecting proteins based on their functional similarity. We then detected network communities as ortholog groups, and each protein in our ortholog group inherited the network degree centrality. By benchmarking Quest for Orthologs (QfO) and some representative ortholog resources, we concluded the degree centrality could serve as the index for the reliability of functional consistency. And the numerical nature of degree centrality also opens a door for quantitative study in pan-genome and other comparative genomics studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.