John-Marc Chandonia scite author profile

WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with additional features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization.Sequence logos were invented by Tom Schneider and Mike Stephens (Schneider and Stephens 1990;Shaner et al. 1993) to display patterns in sequence conservation, and to assist in discovering and analyzing those patterns. As an example, the accompanying figure (Fig. 1) shows how WebLogo can help interpret the sequence-specific binding of the protein CAP to its DNA recognition site (Schultz et al. 1991). Homodimeric DNA-binding proteins typically display a symmetric double hump in the DNA binding-site logo (Schneider and Stephens 1990), as shown in the figure. Deviations from this basic pattern can indicate additional features; a highly conserved residue in the center of such a pattern may indicate DNA distortion or base flipping (Schneider 2001); an unexpectedly high-sequence conservation may be due to overlapping binding sites (Schneider et al. 1986). Protein logos can illuminate patterns of amino acid conservation that are often of structural or functional importance (Galperin et al. 2001;Rigden et al. 2003). Sequence logos have also been used to display patterns in the BLOCKS protein sequence database (Henikoff et al. 1995), and in DNA-binding site motifs (Robison et al. 1998;Nelson et al. 2002), to analyze splice sites (Stephens and Schneider 1992;Emmert et al. 2001), and in a variety of other contexts. Additional examples, and the raw data for the example presented here, can be found on the WebLogo examples page (http://weblogo.berkeley.edu/examples.html).The logo generation form (http://weblogo.berkeley.edu/ logo.cgi) can process RNA, DNA, or protein multiple sequence alignments provided in either FASTA (Pearson and Lipman 1988) or CLUSTAL (Higgins and Sharp 1988) formats. If the user does not explicitly specify the sequence type, then WebLogo will make a determination on the basis of the symbols found within the sequences. A logo represents each column of the alignment by a stack of letters, with the height of each letter proportional to the observed frequency of the corresponding amino acid or nucleotide, and the overall height of each stack proportional to the sequence conservation, measured in bits, at tha...

show abstract

Data growth and its impact on the SCOP database: new developments

Andreeva

Howorth

Chandonia

et al. 2007

Nucleic Acids Research

918

883

View full text Add to dashboard Cite

The Structural Classification of Proteins (SCOP) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. The SCOP hierarchy comprises the following levels: Species, Protein, Family, Superfamily, Fold and Class. While keeping the original classification scheme intact, we have changed the production of SCOP in order to cope with a rapid growth of new structural data and to facilitate the discovery of new protein relationships. We describe ongoing developments and new features implemented in SCOP. A new update protocol supports batch classification of new protein structures by their detected relationships at Family and Superfamily levels in contrast to our previous sequential handling of new structural data by release date. We introduce pre-SCOP, a preview of the SCOP developmental version that enables earlier access to the information on new relationships. We also discuss the impact of worldwide Structural Genomics initiatives, which are producing new protein structures at an increasing rate, on the rates of discovery and growth of protein families and superfamilies. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop.

show abstract

KBase: The United States Department of Energy Systems Biology Knowledgebase

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.