The Helicobacter pylori genome: From sequence analysis to structural and functional predictions

Pawłowski, Krzysztof; Zhang, Baohong; Rychlewski, Leszek; Godzik, Adam

doi:10.1002/(sici)1097-0134(19990701)36:1<20::aid-prot2>3.3.co;2-o

Cited by 5 publications

(8 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Traditionally, this approach concentrates on specific protein families. With thousands of fold predictions available on genome scale~Casari et al Fischer & Eisenberg, 1997;Jones, 1998;Pawlowski et al, 1999;Rychlewski et al, 1998Rychlewski et al, , 1999!, the automated alignment analysis becomes increasingly important. An automated method to verify the conservation of the functional site residues for alignments from sequence analysis and fold prediction methods was used to analyze the results of the previous fold prediction for proteins from the E. coli genome.…”

Section: Discussionmentioning

confidence: 99%

From fold predictions to function predictions: Automation of functional site conservation analysis for functional genome predictions

et al. 1999

Self Cite

View full text Add to dashboard Cite

A database of functional sites for proteins with known structures, SITE, is constructed and used in conjunction with a simple pattern matching program SiteMatch to evaluate possible function conservation in a recently constructed database of fold predictions for Escherichia coli proteins~Rychlewski L et al., 1999, Protein Sci 8:614-624!. In this and other prediction databases, fold predictions are based on algorithms that can recognize weak sequence similarities and putatively assign new proteins into already characterized protein families. It is not clear whether such sequence similarities arise from distant homologies or general similarity of physicochemical features along the sequence. Leaving aside the important question of nature of relations within fold superfamilies, it is possible to assess possible function conservation by looking at the pattern of conservation of crucial functional residues. SITE consists of a multilevel function description based on structure annotations and structure analyses. In particular, active site residues, ligand binding residues, and patterns of hydrophobic residues on the protein surface are used to describe different functional features. SiteMatch, a simple pattern matching program, is designed to check the conservation of residues involved in protein activity in alignments generated by any alignment method. Here, this procedure is used to study conservation of functional features in alignments between protein sequences from the E. coli genome and their optimal structural templates. The optimal templates were identified and alignments taken from the database of genomic structural predictions was described in a previous publication~Rychlewski L et al., 1999, Protein Sci 8:614-624!. An automated assessment of function conservation is used to analyze the relation between fold and function similarity for a large number of fold predictions. For instance, it is shown that identifying low significance predictions with a high level of functional residue conservations can be used to extend the prediction sensitivity for fold prediction methods. Over 100 new fold0function predictions in this class were obtained in the E. coli genome. At the same time, about 30% of our previous fold predictions are not confirmed as function predictions, further highlighting the problem of function divergence in fold superfamilies. Keywords: fold assignments; function predictions; genome analysisThe prediction of protein folds and functions from sequence is the "Holy Grail" of molecular biology. With improving sequencing methods, the number of known protein sequences has increased over 10-fold in the last two years and is expected to grow even faster in the next several years. The experimental characterization of new proteins is also improving, but at a much slower rate. Consequently, computer analysis of new sequences, particularly aiming at recognition of similarity to the already characterized protein families, has become a primary tool for analysis of new sequences. For instance, most newl...

show abstract

Section: Discussionmentioning

confidence: 99%

From fold predictions to function predictions: Automation of functional site conservation analysis for functional genome predictions

et al. 1999

Self Cite

View full text Add to dashboard Cite

show abstract

“…Surprisingly, despite their different points of origin, both profile and threading methods seem to give similar results and reliability estimates, at least in limited tests~Rychlewski et al, 1998!. Therefore, in several previous papers~Rychlewski et al, 1998, 1999Pawlowski et al, 1999! we have used a sequence based method for fold assignments.…”

mentioning

confidence: 89%

“…The third algorithm is a profile-profile alignment algorithm BASIC~Rychlewski et al, 1998! developed in our group and used previously to assign folds to proteins from several genomes~Rychlewski et al, 1998genomes~Rychlewski et al, , 1999Pawlowski et al, 1999!, in a fold prediction competition CASP3~Murzin, 1999! and experimental fold prediction competition between automated fold prediction servers CAFASP~Kelley et al, 1999!.…”

Section: Benchmark Statisticsmentioning

confidence: 99%

“…In a further step, different profile building and comparing strategies are tested on this benchmark. Among others, we present details of the methods used in our group to assign folds to proteins from several genomes~Rychlewski et al, 1998genomes~Rychlewski et al, , 1999Pawlowski et al, 1999!. Here, we compare it to the newest version of the PSI-BLAST algorithm~Altschul et al, 1997! and to the next generation profile-profile alignment method developed in our group.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Comparison of sequence profiles. Strategies for structural predictions using sequence information

et al. 2000

Self Cite

View full text Add to dashboard Cite

Distant homologies between proteins are often discovered only after three-dimensional structures of both proteins are solved. The sequence divergence for such proteins can be so large that simple comparison of their sequences fails to identify any similarity. New generation of sensitive alignment tools use averaged sequences of entire homologous families~profiles! to detect such homologies. Several algorithms, including the newest generation of BLAST algorithms and BASIC, an algorithm used in our group to assign fold predictions for proteins from several genomes, are compared to each other on the large set of structurally similar proteins with little sequence similarity. Proteins in the benchmark are classified according to the level of their similarity, which allows us to demonstrate that most of the improvement of the new algorithms is achieved for proteins with strong functional similarities, with almost no progress in recognizing distant fold similarities.It is also shown that details of profile calculation strongly influence its sensitivity in recognizing distant homologies. The most important choice is how to include information from diverging members of the family, avoiding generating false predictions, while accounting for entire sequence divergence within a family. PSI-BLAST takes a conservative approach, deriving a profile from core members of the family, providing a solid improvement without almost any false predictions. BASIC strives for better sensitivity by increasing the weight of divergent family members and paying the price in lower reliability. A new FFAS algorithm introduced here uses a new procedure for profile generation that takes into account all the relations within the family and matches BASIC sensitivity with PSI-BLAST like reliability. Keywords: fold recognition; PSI-BLAST; sequence profileA simple observation that homologous proteins have similar folds and strong similarities in their functions forms a cornerstone of most methods of predicting protein structure and function from sequence. Structure and0or function prediction is usually based on establishing homology between a newly sequenced protein and an already known and characterized protein group. Once the homology is established, it is possible to make various inferences about the structure, activity, and function of the new protein.Unfortunately, deciding whether or not two proteins are homologous, i.e., related by evolution is not always easy. The usual approach is to look for similarity between their amino acid sequences. Dynamic programming~Needleman & Wunsch, 1970! provides a very powerful and fast method to compare two sequences. Extensive experience with this approach established quite precise thresholds when the similarity is strong enough to infer that the two proteins are related. The rule of thumb is that for proteins of the approximate length of 100 amino acids, two proteins with sequence similarity around the level of 25% of identities have about 50% chance of being related. The range of sequence similarity ...

show abstract

“…To date, several groups have attempted computational protein folding on a genome-wide scale. These efforts include modeling of the yeast genome [227], analysis of folds in the worm genome [228], and modeling of a number of bacterial genomes [229,230]. Yokoyama and colleagues have initiated a search for all "natively-folded" proteins on a large scale [231], and Baker and colleagues have accomplished a proof-of-concept for ab initio folding [232], but neither of these has yet been applied to a complete genome.…”

Section: Role Of Structure Prediction In the Genomic Era A Genomementioning

confidence: 99%

The Protein Folding Problem: A Biophysical Enigma

Fetrow¹,

Giammona²,

Koliński³

et al. 2002

CPB

View full text Add to dashboard Cite

Protein folding, the problem of how an amino acid sequence folds into a unique three-dimensional shape, has been a long-standing problem in biology. The success of genome-wide sequencing efforts has increased the interest in understanding the protein folding enigma, because realizing the value of the genomic sequences rests on the accuracy with which the encoded gene products are understood. Although a complete understanding of the kinetics and thermodynamics of protein folding has remained elusive, there has been considerable progress in techniques to predict protein structure from amino acid sequences. The prediction techniques fall into three general classes: comparative modeling, threading and ab initio folding. The current state of research in each of these three areas is reviewed here in detail. Efforts to apply each method to proteome-wide analysis are reviewed, and some of the key technical hurdles that remain are presented. Protein folding technologies, while not yet providing a full understanding of the protein folding process, have clearly progressed to the point of being useful in enabling structure-based annotation of genomic sequences.

show abstract

The Helicobacter pylori genome: From sequence analysis to structural and functional predictions

Cited by 5 publications

References 37 publications

From fold predictions to function predictions: Automation of functional site conservation analysis for functional genome predictions

From fold predictions to function predictions: Automation of functional site conservation analysis for functional genome predictions

Comparison of sequence profiles. Strategies for structural predictions using sequence information

The Protein Folding Problem: A Biophysical Enigma

Contact Info

Product

Resources

About