The paper presents a novel approach to infer a structuredness in a set of symbol sequences such as transcriptome nucleotide sequences. A distribution pattern of triplet frequencies in the Siberian larch (Larix sibirica Ledeb.) transcriptome sequences was investigated in the presented study. It was found that the larch transcriptome demonstrates a number of unexpected symmetries in the statistical and combinatorial properties.Keywords: nucleotide sequence complexity, frequency dictionary, order, Larix sibirica, Siberian larch, symmetry, transcriptome, triplet. DOI: 10.17516/1997-1389-2015 . For our further analysis we also assumed that neither other symbols, nor blan spaces are supposed to be found in a sequence; a sequence under consideration is also suppose to be coherent (i. e. consisting of a single piece).. For our further analysis we also assumed that neither other symbols, nor blank spaces are supposed to be found in a sequence; a sequence under consideration is also supposed to be coherent (i. e. consisting of a single piece).We studied an order and structuredness over a set of sequences from finite alphabet Key idea in our search for a structure and order in a set of symbol sequences (transcriptome nucleotide sequences) is to translate sequences into their frequency dictionary (Bugaenko et al., 1996(Bugaenko et al., , 1997(Bugaenko et al., , 1998Hu and Wang, 2001). There could be a number of various definitions of a frequency dictionary, but we will use the basic one that is a list of all the strings of a given length accompanied with a frequency of each string (a detailed description is given below). It is crucial that the transformation of a symbol sequence into a frequency dictionary allows us to map a set of sequences into a metric space. The latter provided us with powerful and extended tools for analysis.We will briefly outline the concept of our study and then demonstrate the main results obtained. First, we changed each symbol sequence (that is a nucleotide sequence in the Siberian larch transcriptome set) into a frequency dictionary. Then, we studied distribution of those dictionaries in a multidimensional space trying to infer any regularities and clusters.Second, for each clustering we checked for stability of clustering. This clustering was carried out using the K-means technique.Third, we compared the statistical properties of the clusters identified by K-means and found that these clusters demonstrated a very strong symmetry in terms of the statistical properties.In brief, the clusters showed extremely low level of discrepancy in the Chargaff's second parity rule. This low discrepancy is the most intriguing fact concerning the properties of the studied transcriptome sequence set. Materials and Methods Transcriptome nucleotide sequence dataThe transcriptome Surely, this part of the transcriptome requires special studies. Frequency DictionaryPreviously (Bugaenko et al., 1996(Bugaenko et al., , 1997(Bugaenko et al., , 1998Hu and Wang, 2001), a frequency dictionary was proposed to be a fun...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.