2003
DOI: 10.1093/nar/gkg046
|View full text |Cite
|
Sign up to set email alerts
|

The InterPro Database, 2003 brings increased coverage and new features

Abstract: InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases indivi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

2
384
0
1

Year Published

2004
2004
2005
2005

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 648 publications
(387 citation statements)
references
References 21 publications
2
384
0
1
Order By: Relevance
“…None of the test set interactions were part of the training set. The GSN interaction set was defined as all protein pairs in which one protein was assigned the plasma membrane cellular component (1,426 proteins) and the other the nuclear cellular component (2,253), as assigned by Gene Ontology Consortium. Twenty-nine proteins that were assigned to both components were removed.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…None of the test set interactions were part of the training set. The GSN interaction set was defined as all protein pairs in which one protein was assigned the plasma membrane cellular component (1,426 proteins) and the other the nuclear cellular component (2,253), as assigned by Gene Ontology Consortium. Twenty-nine proteins that were assigned to both components were removed.…”
Section: Methodsmentioning
confidence: 99%
“…We began by assembling a collection of genomic and proteomic data potentially useful in predicting human protein-protein interactions that included model organism protein-protein interactions 1 , protein domain assignments 2 , gene expression measurements in human tissue samples 3 and biological function annotations 4 (Table 1). Based on previous reports, we suspected that (i) model organism interactions may suggest interactions among orthologous human proteins 5,6 , (ii) similar gene expression profiles across a panel of human tissue samples may identify interacting protein products 7,8 , (iii) protein domain pairs enriched among known human protein-protein interactions may suggest novel interactions 9 , (iv) shared functional annotations from Gene Ontology 4 may suggest physical interactions, and (v) that combining evidence from independent data sources may strongly predict protein-protein interactions [10][11][12] .…”
mentioning
confidence: 99%
“…The Analysis Server automatically creates workflows in the abstract Virtual Data Language, based on predefined templates (Section 3.5), which it then executes on distributed Grid resources such as Grid2003 and TeraGrid. The Update Server updates the Integrated Database with recently changed data from a set of monitored public databases (currently including NCBI RefSeq [22], PIR [23], InterPro [6], and KEGG [24]). In the following sections, we describe the implementation details of each of the components of GNARE.…”
Section: System Overview and Designmentioning
confidence: 99%
“…The proteomes of these organisms differ in domain complexity from that of Arabidopsis thaliana. A preliminary analysis of InterPro (Mulder et al 2003) domain matches to each of these proteomes indicates that, on an average, each Arabidopsis protein matches 4.5 InterPro domains, whereas the corresponding number for human proteins is 9. Given that protein families usually consist of proteins with similar domain architectures, we believe that the larger number of domains per protein actually improves the clusterability of the protein families.…”
Section: Applicability To Other Species Datamentioning
confidence: 99%
“…More sophisticated approaches detect domains using domain databases (Bateman et al 2002;Servant et al 2002;Mulder et al 2003), optionally use the order of domains as a fingerprint for the protein, and classify proteins into families on the basis of the presence of shared domains or similar domain architecture (Geer et al 2002). Classification of proteins into families using structural similarities (Holm and Sander 1996) is, at present, limited by the relatively small number of structures available in PDB ( Similarity-based clustering is a two-step process-one first needs to determine pairwise similarities between all pairs of proteins and then apply a clustering method that uses the similarity matrix to group proteins into clusters.…”
mentioning
confidence: 99%