The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath/) currently contains 43 229 domains classified into 1467 superfamilies and 5107 sequence families. Each structural family is expanded with sequence relatives from GenBank and completed genomes, using a variety of efficient sequence search protocols and reliable thresholds. This extended CATH protein family database contains 616 470 domain sequences classified into 23 876 sequence families. This results in the significant expansion of the CATH HMM model library to include models built from the CATH sequence relatives, giving a 10% increase in coverage for detecting remote homologues. An improved Dictionary of Homologous superfamilies (DHS) (http://www.biochem.ucl.ac.uk/bsm/dhs/) containing specific sequence, structural and functional information for each superfamily in CATH considerably assists manual validation of homologues. Information on sequence relatives in CATH superfamilies, GenBank and completed genomes is presented in the CATH associated DHS and Gene3D resources. Domain partnership information can be obtained from Gene3D (http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/). A new CATH server has been implemented (http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl) providing automatic classification of newly determined sequences and structures using a suite of rapid sequence and structure comparison methods. The statistical significance of matches is assessed and links are provided to the putative superfamily or fold group to which the query sequence or structure is assigned.
We present an analysis of 203 completed genomes in the Gene3D resource (including 17 eukaryotes), which demonstrates that the number of protein families is continually expanding over time and that singleton-sequences appear to be an intrinsic part of the genomes. A significant proportion of the proteomes can be assigned to fewer than 6000 well-characterized domain families with the remaining domain-like regions belonging to a much larger number of small uncharacterized families that are largely species specific. Our comprehensive domain annotation of 203 genomes enables us to provide more accurate estimates of the number of multi-domain proteins found in the three kingdoms of life than previous calculations. We find that 67% of eukaryotic sequences are multi-domain compared with 56% of sequences in prokaryotes. By measuring the domain coverage of genome sequences, we show that the structural genomics initiatives should aim to provide structures for less than a thousand structurally uncharacterized Pfam families to achieve reasonable structural annotation of the genomes. However, in large families, additional structures should be determined as these would reveal more about the evolution of the family and enable a greater understanding of how function evolves.
Assigning biologic function to the many sequenced but still uncharacterized genes remains the greatest obstacle confronting the human genome project. Differential gene expression profiling routinely detects uncharacterized genes aberrantly expressed in conditions such as cancer but cannot determine which genes are functionally involved in such complex phenotypes. Integrating gene expression profiling with specific modulation of gene expression in relevant disease models can identify complex biologic functions controlled by currently uncharacterized genes. Here, we used systemic gene transfer in tumor-bearing mice to identify novel antiinvasive and antimetastatic functions for Fkbp8, and subsequently for Fkbp1a. Fkbp8 is a previously uncharacterized member of the FK-506-binding protein (FKBP) gene family downregulated in aggressive tumors. Antitumor effects produced by Fkbp1a gene expression are mediated by cellular pathways entirely distinct from those responsible for antitumor effects produced by Fkbp1a binding to its bacterially derived ligand, rapamycin. We then used gene expression profiling to identify syndecan 1 (Sdc1) and matrix metalloproteinase 9 (MMP9) as genes directly regulated by Fkbp1a and Fkbp8. FKBP gene expression coordinately induces the expression of the antiinvasive Sdc1 gene and suppresses the proinvasive MMP9 gene. Conversely, short interfering RNA-mediated suppression of Fkbp1a increases tumor cell invasion and MMP9 levels, while down-regulating Sdc1. Thus, syndecan 1 and MMP9 appear to mediate the antiinvasive and antimetastatic effects produced by FKBP gene expression. These studies show that uncharacterized genes differentially expressed in metastatic cancers can play important functional roles in the metastatic phenotype. Furthermore, identifying gene regulatory networks that function to control tumor progression may permit more accurate modeling of the complex molecular mechanisms of this disease.
The Gene3D release 4 database and web portal () provide a combined structural, functional and evolutionary view of the protein world. It is focussed on providing structural annotation for protein sequences without structural representatives—including the complete proteome sets of over 240 different species. The protein sequences have also been clustered into whole-chain families so as to aid functional prediction. The structural annotation is generated using HMM models based on the CATH domain families; CATH is a repository for manually deduced protein domains. Amongst the changes from the last publication are: the addition of over 100 genomes and the UniProt sequence database, domain data from Pfam, metabolic pathway and functional data from COGs, KEGG and GO, and protein–protein interaction data from MINT and BIND. The website has been rebuilt to allow more sophisticated querying and the data returned is presented in a clearer format with greater functionality. Furthermore, all data can be downloaded in a simple XML format, allowing users to carry out complex investigations at their own computers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.