Functional enrichment analysis is an essential task for the interpretation of gene lists derived from large-scale genetic, transcriptomic and proteomic studies. WebGestalt (WEB-based GEne SeT AnaLysis Toolkit) has become one of the popular software tools in this field since its publication in 2005. For the last 7 years, WebGestalt data holdings have grown substantially to satisfy the requirements of users from different research areas. The current version of WebGestalt supports 8 organisms and 201 gene identifiers from various databases and different technology platforms, making it directly available to the fast growing omics community. Meanwhile, by integrating functional categories derived from centrally and publicly curated databases as well as computational analyses, WebGestalt has significantly increased the coverage of functional categories in various biological contexts including Gene Ontology, pathway, network module, gene–phenotype association, gene–disease association, gene–drug association and chromosomal location, leading to a total of 78 612 functional categories. Finally, new interactive features, such as pathway map, hierarchical network visualization and phenotype ontology visualization have been added to WebGestalt to help users better understand the enrichment results. WebGestalt can be freely accessed through http://www.webgestalt.org or http://bioinfo.vanderbilt.edu/webgestalt/.
BackgroundAnswering questions such as "Which genes are related to breast cancer?" usually requires retrieving relevant publications through the PubMed search engine, reading these publications, and creating gene lists. This process is not only time-consuming, but also prone to errors.ResultsWe report GLAD4U (Gene List Automatically Derived For You), a new, free web-based gene retrieval and prioritization tool. GLAD4U takes advantage of existing resources of the NCBI to ensure computational efficiency. The quality of gene lists created by GLAD4U for three Gene Ontology (GO) terms and three disease terms was assessed using corresponding "gold standard" lists curated in public databases. For all queries, GLAD4U gene lists showed very high recall but low precision, leading to low F-measure. As a comparison, EBIMed's recall was consistently lower than GLAD4U, but its precision was higher. To present the most relevant genes at the top of a list, we studied two prioritization methods based on publication count and the hypergeometric test, and compared the ranked lists and those generated by EBIMed to the gold standards. Both GLAD4U methods outperformed EBIMed for all queries based on a variety of quality metrics. Moreover, the hypergeometric method allowed for a better performance by thresholding genes with low scores. In addition, manual examination suggests that many false-positives could be explained by the incompleteness of the gold standards. The GLAD4U user interface accepts any valid queries for PubMed, and its output page displays the ranked gene list and information associated with each gene, chronologically-ordered supporting publications, along with a summary of the run and links for file export and functional enrichment and protein interaction network analysis.ConclusionsGLAD4U has a high overall recall. Although precision is generally low, the prioritization methods successfully rank truly relevant genes at the top of the lists to facilitate efficient browsing. GLAD4U is simple to use, and its interface can be found at: http://bioinfo.vanderbilt.edu/glad4u.
Identification and annotation of mutated genes or proteins involved in oncogenesis and tumor progression are crucial for both cancer biology and clinical applications. We have developed a human Cancer Proteome Variation Database (CanProVar) by integrating information on protein sequence variations from various public resources, with a focus on cancer-related variations (crVAR). We have also built a user-friendly interface for querying the database. The current version of CanProVar comprises 8,570 crVARs in 2,921 proteins derived from existing genome variation databases and recently published large-scale cancer genome resequencing studies. It also includes 41,541 non-cancer specific variations (ncsVARs) in 30,322 proteins derived from the dbSNP database. CanProVar provides quick access to known crVARs in protein sequences along with related cancer samples, relevant publications, data sources, and functional information such as Gene Ontology (GO) annotations for the proteins, protein domains in which the variation occurs, and protein interaction partners with crVARs. CanProVar also helps reveal functional characteristics of crVARs and proteins bearing these variations. Our analysis showed that crVARs were enriched in certain protein domains. We also showed that proteins bearing crVARs were more likely to interact with each other in the protein interaction network. CanProVar can be accessed from http://bioinfo.vanderbilt. edu/canprovar.
A method for the rapid correlation of tandem mass spectra to a list of protein sequences in a database has been developed. The combination of the fast and accurate computational search algorithm, X!Tandem, and a Linux cluster parallel computing environment with PVM or MPI, significantly reduces the time required to perform the correlation of tandem mass spectra to protein sequences in a database. A file of tandem mass spectra is divided into a specified number of files, each containing an equal number of the spectra from the larger file. These files are then searched in parallel against a protein sequence database. The results of each parallel output file are collated into one file for viewing through a web interface. Thousands of spectra can be searched in an accurate, practical, and time effective manner. The source code for running Parallel Tandem utilizing either PVM or MPI on Linux operating system is available from http://www.thegpm.org. This source code is made available under Artistic License from the authors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.