Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base-base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression.
Post-translational modifications (PTMs) are important steps in the biosynthesis of proteins. Aside from their integral contributions to protein development, i.e. perform specialized proteolytic cleavage of regulatory subunits, the covalent addition of functional groups of proteins or the degradation of entire proteins, PTMs are also involved in enabling proteins to withstand and recover from temporary environmental stresses (heat shock, microgravity and many others). The literature supports evidence of thousands of recently discovered PTMs, many of which may likely contribute similarly (perhaps, even, interchangeably) to protein stress response. Although there are many PTM actors on the biological stage, our study determines that these PTMs are generally cast into organism-specific, preferential roles. In this work, we study the PTM compositions across the mitochondrial (Mt) and non-Mt proteomes of 11 diverse organisms to illustrate that each organism appears to have a unique list of PTMs, and an equally unique list of PTM-associated residue reaction sites (RSs), where PTMs interact with protein.Despite the present limitation of available PTM data across different species, we apply existing and current protein data to illustrate particular organismal biases. We explore the relative frequencies of observed PTMs, the RSs and general amino-acid compositions of Mt and non-Mt proteomes. We apply these data to create networks and heatmaps to illustrate the evidence of bias. We show that the number of PTMs and RSs appears to grow along with organismal complexity, which may imply that environmental stress could play a role in this bias.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.