The PRINTS database of protein 'fingerprints' is described. Fingerprints comprise sets of motifs excised from conserved regions of sequence alignments, their diagnostic power or potency being refined by iterative database scanning (in this case the OWL composite sequence database). Generally, the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3-D space. The use of groups of independent, linearly or spatially separate motifs allows particular protein folds and functionalities to be characterized more flexibly and powerfully than conventional single-component patterns or regular expressions. The current version of the database (4.0) contains 150 entries (encoding > 700 motifs), covering a wide range of globular and membrane proteins, modular polypeptides and so on. The growth of the database is influenced by a number of factors, e.g. the use of multiple motifs, the maximization of sequence information through iterative database scanning and the fact that the database searched is a large composite. The information contained within PRINTS is distinct from but complementary to the single consensus expressions stored in the widely used PROSITE dictionary of patterns.
PRINTS is a database of protein family 'fingerprints' offering a diagnostic resource for newly-determined sequences. By contrast with PROSITE, which uses single consensus expressions to characterise particular families, PRINTS exploits groups of motifs to build characteristic signatures. These signatures offer improved diagnostic reliability by virtue of the mutual context provided by motif neighbours. To date, 800 fingerprints have been constructed and stored in PRINTS. The current version, 17.0, encodes approximately 4500 motifs, covering a range of globular and membrane proteins, modular polypeptides, and so on. The database is accessible via the UCL Bioinformatics World Wide Web (WWW) Server at http://www. biochem.ucl.ac.uk/bsm/dbbrowser/ . We have recently enhanced the usefulness of PRINTS by making available new, intuitive search software. This allows both individual query sequence and bulk data submission, permitting easy analysis of single sequences or complete genomes. Preliminary results indicate that use of the PRINTS system is able to assign additional functions not found by other methods, and hence offers a useful adjunct to current genome analysis protocols.
This paper introduces a method that classifies sequences using familial definitions from the PRINTS database, allowing progress to be made with the identification of distant evolutionary relationships. The approach makes use of the contextual information inherent in a multiple-motif method, and has the power to identify hitherto unidentified relationships in mass genome data. We exemplify our method by a comparison of database searches with uncharacterized sequences from the Caenorhabditis elegans and Saccharomyces cerevisiae genome projects. This analysis tool combines a simple, user-friendly interface with the capacity to provide an 'intelligent', biologically relevant result.
The Central Aspergillus Data REpository (CADRE; http://www.cadre-genomes.org.uk) is a public resource for genomic data extracted from species of Aspergillus. It provides an array of online tools for searching and visualising features of this significant fungal genus. CADRE arose from a need within the medical community to understand the human pathogen Aspergillus fumigatus. Due to the paucity of Aspergillus genomic resources 10 years ago, the long-term goal of this project was to collate and maintain Aspergillus genomes as they became available. Since our first release in 2004, the resource has expanded to encompass annotated sequence for eight other Aspergilli and provides much needed support to the international Aspergillus research community. Recent developments, however, in sequencing technology are creating a vast amount of genomic data and, as a result, we shortly expect a tidal wave of Aspergillus data. In preparation for this, we have upgraded the database and software suite. This not only enables better management of more complex data sets, but also improves annotation by providing access to genome comparison data and the integration of high-throughput data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.