Orthology prediction is challenging yet rewarding. Orthologs lay the cornerstone of almost all comparative genomics studies. Dozens of ortholog resources have been available and broadly used over the past decades. However, the inconsistency between these resources has drawn growing concerns, especially when more proteomes are available and ortholog databases expand. It is no longer easy to decide which ortholog database to use and compare conclusions based on different resources. We are presenting here a metric to assess ortholog functional consistency. Using this metric, we built a network connecting proteins based on their functional similarity. We then detected network communities as ortholog groups, and each protein in our ortholog group inherited the network degree centrality. By benchmarking Quest for Orthologs (QfO) and some representative ortholog resources, we concluded the degree centrality could serve as the index for the reliability of functional consistency. And the numerical nature of degree centrality also opens a door for quantitative study in pan-genome and other comparative genomics studies.
A new type of residue interaction network named residue interaction energy network (RINN) is built. Then, a multi-objective optimization dynamic network community discovery algorithm T-DYNMOGA-Q w has been proposed to detect communities from dynamic RINN. T-DYNMOGA-Q w sets a threshold during the initialization process and optimizes weighted modularity Q w as the objective function. Setting the threshold can better find the stable structure in the dynamic RINN. The resolution limit of modularization has been broken by using objective function Q w. After Community detection from dynamic RINN of wild type of lipase (WTL) and its mutant 6B from 300K to 400K, it is found that the communities in 6B network can still maintain a tight structure even at higher temperature. Stable community is benefit to the heat resistance of lipase 6B. The hydrogen bonds between mutated Ser15 and Ser17, and the Glu20 with other residues improved the structure stability. The mutated L114P, M134E, M137P, and S163P enhance the rigidity of the flexible region and tighten the secondary structure, which stabilize the protein structure. INDEX TERMS Residue interaction energy network, community detection, lipase thermostability.
Ortholog prediction, essential for various genomic research areas, faces growing inconsistencies amidst the expanding array of ortholog databases. The common strategy of computing consensus orthologs introduces additional arbitrariness, underscoring the need to identify proteins prone to ortholog prediction inconsistency. To address this, we introduce the Signal Jaccard Index (SJI), a novel metric based on unsupervised genome context clustering, to assess protein similarity. Utilizing SJI, we construct a protein network, revealing that proteins at the network peripheries primarily contribute to prediction inconsistency. Importantly, we show that a protein's degree centrality can gauge its assignment reliability to a consensus set, facilitating the refinement of ortholog predictions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.