2004
DOI: 10.1101/gr.2079204
|View full text |Cite
|
Sign up to set email alerts
|

Visualizing Sequence Similarity of Protein Families

Abstract: Classification of proteins into families is one of the main goals of functional analysis. Proteins are usually assigned to a family on the basis of the presence of family-specific patterns, domains, or structural elements. Whereas proteins belonging to the same family are generally similar to each other, the extent of similarity varies widely across families. Some families are characterized by short, well-defined motifs, whereas others contain longer, less-specific motifs. We present a simple method for visual… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2007
2007
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 43 publications
0
7
0
Order By: Relevance
“…We defined sensitivity as the average percentage of how many of the proteins with same InterPro domain structure (see below) are contained in a single cluster. In parallel, specificity was defined as the average percentage of how many of the proteins in a cluster have the same InterPro domain structure [ 22 ]. We selected for further analysis a clustering where r = 3.1, as a good compromise between sensitivity and specificity.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We defined sensitivity as the average percentage of how many of the proteins with same InterPro domain structure (see below) are contained in a single cluster. In parallel, specificity was defined as the average percentage of how many of the proteins in a cluster have the same InterPro domain structure [ 22 ]. We selected for further analysis a clustering where r = 3.1, as a good compromise between sensitivity and specificity.…”
Section: Resultsmentioning
confidence: 99%
“…Each cluster's InterPro entry structure was defined as the most common InterPro entry structure found in its member proteins. Subsequently, for each tested r specificity and sensitivity of the clustering (Figure 1 ) were counted as proposed in [ 22 ]. A cluster's specificity was defined as the percentage of proteins in a cluster having the same structure as the cluster's structure.…”
Section: Methodsmentioning
confidence: 99%
“…mcl clustering was run over the range of possible inflation values. For each inflation value, a sensitivity and specificity was calculated for the clustering as previously described [ 32 , 79 ]. In order to calculate these, other secondary Pfam matches were determined for the member proteins of the Pfam under study and the most variable secondary Pfam selected for sensitivity and specificity calculations.…”
Section: Methodsmentioning
confidence: 99%
“…Minimum identity of 60% was 139 selected based on previous research on functional annotation transfer (Radivojac et al, 140 2013); (ii) An expanded genome search using reduced strictness (>60% identity across 141 >80% of the query sequence, resulting in a total identity of >48%) was conducted to 142 provide insight into the potential host range of metagenome-derived proteins with no 143 close matches. The coverage and identity used in the expanded genome search remain 144 more stringent than those previously identified to be optimum for protein family 145 identification and clustering (Veeramachaneni & Makałowski, 2004). Once annotated, 146 genomes were linked to their taxonomic lineage using the NCBI taxonomy information, 147 implemented in MGkit (Rubino et al, 2014), and the Genome Taxonomy database 148 (GTDB), version 86 (Parks et al, 2018).…”
mentioning
confidence: 99%