* Genomic predictions were filtered by retaining only hits to the InterPro database and cleaned from the cephalopod contamination with BLAST searches against the NCBI nr database. * * Transcriptome assemblies were filtered with BLAST searches against the RefSeq database as detailed in Materials and Methods, Section "Assembly and Filtering of Dicyemid Sequences."
BackgroundModeling of a complex biological process can explain the results of experimental studies and help predict its characteristics. Among such processes is transcription in the presence of competing RNA polymerases. This process involves RNA polymerases collision followed by transcription termination.ResultsA mathematical and computer simulation model is developed to describe the competition of RNA polymerases during genes transcription on complementary DNA strands. E.g., in the barley Hordeum vulgare the polymerase competition occurs in the locus containing plastome genes psbA, rpl23, rpl2 and four bacterial type promoters. In heat shock experiments on isolated chloroplasts, a twofold decrease of psbA transcripts and even larger increase of rpl23-rpl2 transcripts were observed, which is well reproduced in the model. The model predictions are in good agreement with virtually all relevant experimental data (knockout, heat shock, chromatogram data, etc.). The model allows to hypothesize a mechanism of cell response to knockout and heat shock, as well as a mechanism of gene expression regulation in presence of RNA polymerase competition. The model is implemented for multiprocessor platforms with MPI and supported on Linux and MS Windows. The source code written in C++ is available under the GNU General Public License from the laboratory website. A user-friendly GUI version is also provided at http://lab6.iitp.ru/en/rivals.ConclusionsThe developed model is in good agreement with virtually all relevant experimental data. The model can be applied to estimate intensities of binding of the holoenzyme and phage type RNA polymerase to their promoters using data on gene transcription levels, as well as to predict characteristics of RNA polymerases and the transcription process that are difficult to measure directly, e.g., the intensity (frequency) of holoenzyme binding to the promoter in correlation to its nucleotide composition and the type of σ-subunit, the amount of transcription initiation aborts, etc. The model can be used to make functional predictions, e.g., heat shock response in isolated chloroplasts and changes of gene transcription levels under knockout of different σ-subunits or RNA polymerases or due to gene expression regulation.ReviewersThis article was reviewed by Dr. Anthony Almudevar, Dr. Aniko Szabo, Dr. Yuri Wolf (nominated by Dr. Peter Olofsson) and Prof. Marek Kimmel.
BackgroundPerfectly or highly conserved DNA elements were found in vertebrates, invertebrates, and plants by various methods. However, little is known about such elements in protists. The evolutionary distance between apicomplexans can be very high, in particular, due to the positive selection pressure on them. This complicates the identification of highly conserved elements in alveolates, which is overcome by the proposed algorithm.ResultsA novel algorithm is developed to identify highly conserved DNA elements. It is based on the identification of dense subgraphs in a specially built multipartite graph (whose parts correspond to genomes). Specifically, the algorithm does not rely on genome alignments, nor pre-identified perfectly conserved elements; instead, it performs a fast search for pairs of words (in different genomes) of maximum length with the difference below the specified edit distance. Such pair defines an edge whose weight equals the maximum (or total) length of words assigned to its ends. The graph composed of these edges is then compacted by merging some of its edges and vertices. The dense subgraphs are identified by a cellular automaton-like algorithm; each subgraph defines a cluster composed of similar inextensible words from different genomes. Almost all clusters are considered as predicted highly conserved elements. The algorithm is applied to the nuclear genomes of the superphylum Alveolata, and the corresponding phylogenetic tree is built and discussed.ConclusionWe proposed an algorithm for the identification of highly conserved elements. The multitude of identified elements was used to infer the phylogeny of Alveolata.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1257-5) contains supplementary material, which is available to authorized users.
We report the database of plastid protein families from red algae, secondary and tertiary rhodophyte-derived plastids, and Apicomplexa constructed with the novel method to infer orthology. The families contain proteins with maximal sequence similarity and minimal paralogous content. The database contains 6509 protein entries, 513 families and 278 nonsingletons (from which 230 are paralog-free, and among the remaining 48, 46 contain at maximum two proteins per species, and 2 contain at maximum three proteins per species). The method is compared with other approaches. Expression regulation of the moeB gene is studied using this database and the model of RNA polymerase competition. An analogous database obtained for green algae and their symbiotic descendants, and applications based on it are published earlier.
Protein clustering is useful for refining protein annotations and searching for proteins by their phylogenetic profile. We have performed the clustering of proteins encoded in the plastoms of Rhodophyta, as well as other plastid containing species related to the Rhodophyta branch. The corresponding database and cluster search according to protein phylogenetic profile are available at http://lab6.iitp.ru/ppc/redline. Plastome encoded proteins specific for small taxonomic groups of algae and protozoa have been found based on this database, and the search for and analysis of RNA polymerase in the nuclear genomes of Apicomplexa has been performed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.