Short title:Comparative analysis of protein-protein interaction databases. AbstractProtein-protein interactions (PPIs) are critical, and so are the databases and tools (resources) concerning PPIs. But in absence of systematic comparisons, biologists/bioinformaticians may be forced to make a subjective selection among such protein interaction databases and tools. In fact, a comprehensive list of such bioinformatics resources has not been reported so far. For the first time, we compiled 375 PPI resources, short-listed and performed preliminary comparison of 125 important ones (both lists available publicly at startbioinfo.com), and then systematically compared human PPIs from 16 carefully-selected databases. General features have been first compared in detail. The coverage of 'experimentally verified' vs. all PPIs, as well as those significant in case of disease-associated and other types of genes among the chosen databases has been compared quantitatively. This has been done in two ways: outputs manually obtained using web-interfaces, and all interactions downloaded from the databases. For the first approach, PPIs obtained in response to gene queries using the web interfaces were compared. As a query set, 108 genes associated with different tissues (specific to kidney, testis, and uterus, and ubiquitous) or diseases (breast cancer, lung cancer, Alzheimer's, cystic fibrosis, diabetes, and cardiomyopathy) were chosen. PPIcoverage for well-studied genes was also compared with that of less-studied ones. For the second approach, the back-end-data from the databases was downloaded and compared. Based on the results, we recommend the use of STRING and UniHI for retrieving the majority of 'experimentally verified' protein interactions, and hPRINT and STRING for obtaining maximum number of 'total' (experimentally verified as well as predicted) PPIs. The analysis of experimentally verified PPIs found exclusively in each database revealed that STRING contributed about 71% of exclusive hits. Overall, hPRINT, STRING and IID together retrieved ~94% of 'total' protein interactions available in the databases. The coverage of certain databases was skewed for some gene-types. The results also indicate that the database usage frequency may not correlate with their advantages, thereby justifying the need for more frequent studies of this nature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.