Sequence weighting techniques are aimed at balancing redundant observed information from subsets of similar sequences in multiple alignments. Traditional approaches apply the same weight to all positions of a given sequence, hence equal efficiency of phylogenetic changes is assumed along the whole sequence. This restrictive assumption is not required for the new method PSIC (position-specific independent counts) described in this paper. The number of independent observations (counts) of an amino acid type at a given alignment position is calculated from the overall similarity of the sequences that share the amino acid type at this position with the help of statistical concepts. This approach allows the fast computation of position-specific sequence weights even for alignments containing hundreds of sequences. The PSIC approach has been applied to profile extraction and to the fold family assignment of protein sequences with known structures. Our method was shown to be very productive in finding distantly related sequences and more powerful than Hidden Markov Models or the profile methods in WiseTools and PSI-BLAST in many cases. The profile extraction routine is available on the WWW (http://www.bork.embl-heidelberg. de/PSIC or http://www.imb.ac.ru/PSIC).
Spider dragline silk possesses impressive mechanical and biochemical properties. It is synthesized by a couple of major ampullate glands in spiders and comprises of two major structural proteins--spidroins 1 and 2. The relationship between structure and mechanical properties of spider silk is not well understood. Here, we modeled the complete process of the spider silk assembly using two new recombinant analogs of spidroins 1 and 2. The artificial genes sequence of the hydrophobic core regions of spidroin 1 and 2 have been designed using computer analysis of existing databases and mathematical modeling. Both proteins were expressed in Pichia pastoris and purified using a cation exchange chromatography. Despite the absence of hydrophilic N- and C-termini, both purified proteins spontaneously formed the nanofibrils and round micelles of about 1 microm in aqueous solutions. The electron microscopy study has revealed the helical structure of a nanofibril with a repeating motif of 40 nm. Using the electrospinning, the thin films with an antiparallel beta-sheet structure were produced. In summary, we were able to obtain artificial structures with characteristics that are perspective for further biomedical applications, such as producing three-dimensional matrices for tissue engineering and drug delivery.
An exhaustive statistical analysis of the amino acid sequences at the carboxyl (C) and amino (N) termini of proteins and of coding nucleic acid sequences at the 5' side of the stop codons was undertaken. At the N ends, Met and Ala residues are over-represented at the first (+1) position whereas at positions 2 and 5 Thr is preferred. These peculiarities at N-termini are most probably related to the mechanism of initiation of translation (for Met) and to the mechanisms governing the life-span of proteins via regulation of their degradation (for Ala and Thr). We assume that the C-terminal bias facilitates fixation of the C ends on the protein globule by a preference for charged and Cys residues. The terminal biases, a novel feature of protein structure, have to be taken into account when molecular evolution, three-dimensional structure, initiation and termination of translation, protein folding and life-span are concerned. In addition, the bias of protein termini composition is an important feature which should be considered in protein engineering experiments.
BackgroundAlgorithms of sequence alignment are the key instruments for computer-assisted studies of biopolymers. Obviously, it is important to take into account the "quality" of the obtained alignments, i.e. how closely the algorithms manage to restore the "gold standard" alignment (GS-alignment), which superimposes positions originating from the same position in the common ancestor of the compared sequences. As an approximation of the GS-alignment, a 3D-alignment is commonly used not quite reasonably. Among the currently used algorithms of a pair-wise alignment, the best quality is achieved by using the algorithm of optimal alignment based on affine penalties for deletions (the Smith-Waterman algorithm). Nevertheless, the expedience of using local or global versions of the algorithm has not been studied.ResultsUsing model series of amino acid sequence pairs, we studied the relative "quality" of results produced by local and global alignments versus (1) the relative length of similar parts of the sequences (their "cores") and their nonhomologous parts, and (2) relative positions of the core regions in the compared sequences. We obtained numerical values of the average quality (measured as accuracy and confidence) of the global alignment method and the local alignment method for evolutionary distances between homologous sequence parts from 30 to 240 PAM and for the core length making from 10% to 70% of the total length of the sequences for all possible positions of homologous sequence parts relative to the centers of the sequences.ConclusionWe revealed criteria allowing to specify conditions of preferred applicability for the local and the global alignment algorithms depending on positions and relative lengths of the cores and nonhomologous parts of the sequences to be aligned. It was demonstrated that when the core part of one sequence was positioned above the core of the other sequence, the global algorithm was more stable at longer evolutionary distances and larger nonhomologous parts than the local algorithm. On the contrary, when the cores were positioned asymmetrically, the local algorithm was more stable at longer evolutionary distances and larger nonhomologous parts than the global algorithm. This opens a possibility for creation of a combined method allowing generation of more accurate alignments.
We analyzed the periodic patterns in E. coli promoters and compared the distributions of the corresponding patterns in promoters and in the complete genome to elucidate their function. Except the three-base periodicity, coincident with that in the coding regions and growing stronger in the region downstream from the transcriptions start (TS), all other salient periodicities are peaked upstream of TS. We found that helical periodicities with the lengths about B-helix pitch ~10.2-10.5 bp and A-helix pitch ~10.8-11.1 bp coexist in the genomic sequences. We mapped the distributions of stretches with A-, B-, and Z-like DNA periodicities onto E. coli genome. All three periodicities tend to concentrate within non-coding regions when their intensity becomes stronger and prevail in the promoter sequences. The comparison with available experimental data indicates that promoters with the most pronounced periodicities may be related to the supercoiling-sensitive genes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.