Chloroplast genomes have eight-cluster structuredness, in triplet frequency space. Small fragments of a genome converted into a triplet frequency dictionaries are the elements to be clustered. Typical structure consists of eight clusters: six of them correspond to three different positions of a reading frame shifted for 0, 1 and 2 nucleotides (in two opposing strands), the seventh cluster corresponds to a junk regions of a genome, and the eighth cluster is comprised by the fragments with excessive GC-content bearing specific RNA genes. The structure exhibits a specific symmetry.
Background
Previously, a seven-cluster pattern claiming to be a universal one in bacterial genomes has been reported. Keeping in mind the most popular theory of chloroplast origin, we checked whether a similar pattern is observed in chloroplast genomes.
Results
Surprisingly, eight cluster structure has been found, for chloroplasts. The pattern observed for chloroplasts differs rather significantly, from bacterial one, and from that latter observed for cyanobacteria. The structure is provided by clustering of the fragments of equal length isolated within a genome so that each fragment is converted in triplet frequency dictionary with non-overlapping triplets with no gaps in frame tiling. The points in 63-dimensional space were clustered due to elastic map technique. The eight cluster found in chloroplasts comprises the fragments of a genome bearing tRNA genes and exhibiting excessively high GC-content, in comparison to the entire genome.
Conclusion
Chloroplasts exhibit very specific symmetry type in distribution of coding and non-coding fragments of a genome in the space of triplet frequencies: this is mirror symmetry. Cyanobacteria may have both mirror symmetry, and the rotational symmetry typical for other bacteria.
Аннотация. Рассмотрена пространственная структура генов фотосистем I и II хлоропластов водорослей. Под пространственной структурой понимается распределение точек, соответствующих частотным словарям генов, в пространстве частот триплетов. Гены фотосистем образуют два основных кластера, соответствующих прямому и обратному стренду. Не обнаружено группирования точек внутри основных кластеров ни по видам организмов, ни по типам генов, как это характерно для хлоропластов наземных растений и цианобактерий. Распределение по значениям GC-состава неоднородно. Часть надтипов имеют градиентное распределение, часть не обнаруживают выраженного порядка распределения.Ключевые слова: порядок, кластеризация распределения, эволюция, триплеты Цитирование: Сенашова М.Ю. Пространственная структура генов фотосинтетической системы хлоропластов водорослей с точки зрения биоинформатики / М.Ю. Сенашова // Информационные и математические технологии в науке и управлении.
We studied the statistical properties of non-coding regions of chloroplast genomes of 391 plants. To do that, each non-coding region has been tiled with a set of overlapping fragments of the same length, and those fragments were transformed into triplet frequency dictionaries. The dictionaries were clustered in 64-dimensional Euclidean space. Five types of the distributions were identified: ball, ball with tail, ball with two tails, lens with tail, and lens with two tails. Besides, the multigenome distribution has been studied: there are ten species performing an isolated and distant cluster; surprisingly, there is no immediate and simple relation in taxonomy composition of these clusters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.