2006
DOI: 10.1093/nar/gkj106
|View full text |Cite
|
Sign up to set email alerts
|

SIMAP: the similarity matrix of proteins

Abstract: Similarity Matrix of Proteins (SIMAP) () provides a database based on a pre-computed similarity matrix covering the similarity space formed by >4 million amino acid sequences from public databases and completely sequenced genomes. The database is capable of handling very large datasets and is updated incrementally. For sequence similarity searches and pairwise alignments, we implemented a grid-enabled software system, which is based on FASTA heuristics and the Smith–Waterman algorithm. Our ProtInfo system allo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
41
0

Year Published

2007
2007
2019
2019

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 42 publications
(41 citation statements)
references
References 28 publications
0
41
0
Order By: Relevance
“…ORFs that did not map to those data sets were then self-clustered using cd-hit (using the same parameters as above), resulting in a total of 27 685 POV PCs containing X20 ORFs each. These PCs were then annotated using the Similarity Matrix of Proteins (Rattei et al, 2006) to assign taxonomy (NCBI) and function (TIGRFAM), with additional functional information obtained from eggNOG (Powell et al, 2012;4 March 2012).…”
Section: Methodsmentioning
confidence: 99%
“…ORFs that did not map to those data sets were then self-clustered using cd-hit (using the same parameters as above), resulting in a total of 27 685 POV PCs containing X20 ORFs each. These PCs were then annotated using the Similarity Matrix of Proteins (Rattei et al, 2006) to assign taxonomy (NCBI) and function (TIGRFAM), with additional functional information obtained from eggNOG (Powell et al, 2012;4 March 2012).…”
Section: Methodsmentioning
confidence: 99%
“…Reads that were present in a virome just once (k-mer = 1) were removed from the analysis given a higher probability of contamination (13) per the discussion above. The results were then used to compute an Euler diagram using the venneuler function (56) in the R statistical software (54).…”
Section: Construction Of Euler Diagrams Depicting Shared Read Contentmentioning
confidence: 99%
“…Precalculated TMHMM 62 and SignalP 63 annotations were obtained for all proteins of the TrEMBL dataset using the SIMAP database. 64 To exclude mispredicted TMDs, the TMHMM and SignalP predictions were compared for every protein and all TMDs overlapping by at least eight amino acids with a predicted signal peptide were eliminated. All proteins containing one predicted TMD after this elimination step were combined with the extracted proteins from the Swiss-Prot database resulting in an initial dataset of 167,125 bitopic proteins.…”
Section: Database Analysismentioning
confidence: 99%