PASS2: an automated database of protein alignments organised as structural superfamilies

Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan

doi:10.1186/1471-2105-5-35

Cited by 34 publications

(29 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have employed structural entries from the PASS2 database to understand and improve sequence data mining techniques, where the searches are specifically meant for distant homologues 19 . SCOP is a good benchmark dataset to evaluate many homology detection methods 6 , but removing redundancy with respect to similar entries reduces the search time.…”

Section: Discussionmentioning

confidence: 99%

“…PASS2 19 based on the SCOP database and the ASTRAL compendium 20 , uses protein structural entries from the SCOP superfamily with less than 40% mutual sequence identity. Three superfamilies each from four structural classes from the PASS2 database were selected for this study ( Table 1).…”

Section: Materials and Methodologymentioning

confidence: 99%

“…We have considered multi-member superfamilies from the PASS2 database 19 . PASS2 is a database of structural alignments of protein domains in a SCOP superfamily which share less than 40% mutual sequence identity.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Improved performance of sequence search algorithms in remote homology detection

2013

Self Cite

View full text Add to dashboard Cite

The protein sequence space is vast and diverse, spanning across different families. Biologically meaningful relationships exist between proteins at superfamily level. However, it is highly challenging to establish convincing relationships at the superfamily level by means of simple sequence searches. It is necessary to design a rigorous sequence search strategy to establish remote homology relationships and achieve high coverage. We have used iterative profile-based methods, along with constraints of sequence motifs, to specify search directions. We address the importance of multiple start points (queries) to achieve high coverage at protein superfamily level. We have devised strategies to employ a structural regime to search sequence space with good specificity and sensitivity. We employ two well-known sequence search methods, PSI-BLAST and PHI-BLAST, with multiple queries and multiple patterns to enhance homologue identification at the structural superfamily level. The study suggests that multiple queries improve sensitivity, while a pattern-constrained iterative sequence search becomes stringent at the initial stages, thereby driving the search in a specific direction and also achieves high coverage. This data mining approach has been applied to the entire structural superfamily database.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Materials and Methodologymentioning

confidence: 99%

See 1 more Smart Citation

Improved performance of sequence search algorithms in remote homology detection

2013

Self Cite

View full text Add to dashboard Cite

show abstract

“…Protein domains in primary structural databases such as PDB (Protein Data Bank) [1] have been grouped according to structural hierarchy such as protein folds, superfamilies and families in databases like CATH (Class, Architecture, Topology, Homologous superfamily) [2] and SCOP (Structural Classification of Proteins) [3]. There are also secondary databases like PASS2 (Protein Alignments organised as Structural Superfamilies) [4-6] which follows the SCOP hierarchy and provide highly accurate structure based sequence alignments for protein domain superfamilies. It is widely accepted that protein domains which cluster under a superfamily generally adopt similar tertiary structure, in spite of having low sequence identity.…”

Section: Introductionmentioning

confidence: 99%

“…This length changes have been caused by “indels” (insertions/deletions) in protein sequences which has in turn been used to follow updates of secondary databases derived from SCOP. Earlier studies by our group had examined the length variations in 353 multi-membered superfamilies from PASS2.2 database [4], using an objective algorithm called CUSP (Conserved Units of Structures in Proteins) [7], and analysed length variations and its consequences on functionality of protein domains [8]. Such analyses have been helpful to recognise and classify superfamilies into 64 “Length-deviant” (ones which can tolerate large {i.e.…”

Section: Introductionmentioning

confidence: 99%

Structural updates of alignment of protein domains and consequences on evolutionary models of domain superfamilies

2013

Self Cite

View full text Add to dashboard Cite

BackgroundInflux of newly determined crystal structures into primary structural databases is increasing at a rapid pace. This leads to updation of primary and their dependent secondary databases which makes large scale analysis of structures even more challenging. Hence, it becomes essential to compare and appreciate replacement of data and inclusion of new data that is critical between two updates. PASS2 is a database that retains structure-based sequence alignments of protein domain superfamilies and relies on SCOP database for its hierarchy and definition of superfamily members. Since, accurate alignments of distantly related proteins are useful evolutionary models for depicting variations within protein superfamilies, this study aims to trace the changes in data in between PASS2 updates.ResultsIn this study, differences in superfamily compositions, family constituents and length variations between different versions of PASS2 have been tracked. Studying length variations in protein domains, which have been introduced by indels (insertions/deletions), are important because theses indels act as evolutionary signatures in introducing variations in substrate specificity, domain interactions and sometimes even regulating protein stability. With this objective of classifying the nature and source of variations in the superfamilies during transitions (between the different versions of PASS2), increasing length-rigidity of the superfamilies in the recent version is observed. In order to study such length-variant superfamilies in detail, an improved classification approach is also presented, which divides the superfamilies into distinct groups based on their extent of length variation.ConclusionsAn objective study in terms of transition between the database updates, detailed investigation of the new/old members and examination of their structural alignments is non-trivial and will help researchers in designing experiments on specific superfamilies, in various modelling studies, in linking representative superfamily members to rapidly expanding sequence space and in evaluating the effects of length variations of new members in drug target proteins. The improved objective classification scheme developed here would be useful in future for automatic analysis of length variation in cases of updates of databases or even within different secondary databases.

show abstract

Evolution of binding sites for zinc and calcium ions playing structural roles

2007

View full text Add to dashboard Cite

The geometry of metal coordination by proteins is well understood, but the evolution of metal binding sites has been less studied. Here we present a study on a small number of well-documented structural calcium and zinc binding sites, concerning how the geometry diverges between relatives, how often nonrelatives converge towards the same structure, and how often these metal binding sites are lost in the course of evolution. Both calcium and zinc binding site structure is observed to be conserved; structural differences between those atoms directly involved in metal binding in related proteins are typically less than 0.5 A root mean square deviation, even in distant relatives. Structural templates representing these conserved calcium and zinc binding sites were used to search the Protein Data Bank for cases where unrelated proteins have converged upon the same residue selection and geometry for metal binding. This allowed us to identify six "archetypal" metal binding site structures: two archetypal zinc binding sites, both of which had independently evolved on a large number of occasions, and four diverse archetypal calcium binding sites, where each had evolved independently on only a handful of occasions. We found that it was common for distant relatives of metal-binding proteins to lack metal-binding capacity. This occurred for 13 of the 18 metal binding sites we studied, even though in some of these cases the original metal had been classified as "essential for protein folding." For most of the calcium binding sites studied (seven out of eleven cases), the lack of metal binding in relatives was due to point mutation of the metal-binding residues, whilst for zinc binding sites, lack of metal binding in relatives always involved more extensive changes, with loss of secondary structural elements or loops around the binding site.

show abstract

PASS2: an automated database of protein alignments organised as structural superfamilies

Cited by 34 publications

References 32 publications

Improved performance of sequence search algorithms in remote homology detection

Improved performance of sequence search algorithms in remote homology detection

Structural updates of alignment of protein domains and consequences on evolutionary models of domain superfamilies

Evolution of binding sites for zinc and calcium ions playing structural roles

Contact Info

Product

Resources

About