A consensus approach for the assignment of structural domains in proteins is presented. The approach combines a number of previously published algorithms, and takes advantage of the elevated accuracy obtained when assignments from the individual algorithms are in agreement. The consensus approach is tested on a data set of 55 protein chains, for which domain assignments from four automated methods were known, and for which crystallographers assignments had been reported in the literature. Accuracy was found to increase in this test from 72% using individual algorithms to 100% when all four methods were in agreement. However a consensus prediction using all four methods was only possible for 52% of the dataset. The consensus approach [using three publicly available domain assignment algorithms (PUU, DETECTIVE, DOMAK)] was then used to make domain assignments for a data set of 787 protein chains from the Protein Data Bank. Analysis of the assignments showed 55.7% of assignments could be made automatically, and of these, 13.5% were multi-domain proteins. Of the remaining 44.3% that could not be assigned by the consensus procedure 90.4% had their domain boundaries assigned correctly by at least one of the algorithms. Once identified, these domains were analyzed for trends in their size and secondary structure class. In addition, the discontinuity of each domain along the protein chain was considered.
The CATH database of protein domain structures classi®es structures according to their (C)lass, (A)rchitecture, (T)opology or fold and (H)omologous family (http://www.biochem.ucl.ac.uk/bsm/cath). Although the protocol used is mostly automatic, manual inspection is used to check assignments at some critical stages, such as the detection of very distantly related homologues and anologues and the assignment of novel architectures. Described in this article is a recently established facility to search the database with the coordinates of a newly determined structure. The CATH server ®rst locates domain boundaries and then uses automatic sequence and structure comparison methods to assign this new structure to one or more of the domain families within CATH. Diagnostic reports are generated, together with multiple structural alignments for close relatives. The Server can be accessed over the World Wide Web (WWW) and mirror sites are planned to improve access.
A program is described for automatically generating schematic linear representations of protein chains in terms of their structural domains. The program requires the co-ordinates of the chain, the domain assignment, PROSITE information and a file listing all intermolecular interactions in the protein structure. The output is a PostScript file in which each protein is represented by a set of linked boxes, each box corresponding to all or part of a structural domain. PROSITE motifs and residues involved in ligand interactions are highlighted. The diagrams allow immediate visualization of the domain arrangement within a protein chain, and by providing information on sequence motifs, and metal ion, ligand and DNA binding at the domain level, the program facilitates detection of remote evolutionary relationships between proteins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.