In this paper, we propose a new 4D graphical representation of DNA primary sequences based on the Z-curve theory. The advantage of our approach is that it has the similar biological significance with the Z-curve, and it does not lose any biological information of the sequence. The geometrical centers of the 4D graphical representation of DNA sequences are used as the numerical characterizations, and the examination of similarities/dissimilarities about the coding sequences of the first exon of β-globin genes of eleven different species shows that this approach can provide satisfactory results.
DNA sequences, 4D graphical representation, similarities/dissimilarities
Citation:Tang X C, Zhou P P, Qiu W Y. On the similarity/dissimilarity of DNA sequences based on 4D graphical representation.DNA has been known as the physical basis for the storage and delivery of genetic information, and the connection and arrangement of its constituent bases (adenine (A), guanine (G), thymine (T) and cytosine (C)), that is, DNA sequence, plays a crucial role in determining the characteristics of organism. The analysis of DNA sequence has attracted a lot of interest in recent years due to its significant importance in the realization of the functions gene takes on [1]. Graphical representation of DNA sequence has now been a powerful tool in the investigation of gene sequences [2-17], because it can offer visual inspection of data, and facilitate the analysis, comparison and recognition of DNA sequences as well as the differences in their structures. Several authors proposed different 2D or 3D graphical representations of DNA sequences [2][3][4][5][6][7]. However, both 2D and 3D graphical representations are accompanied with some loss of biological information due to overlapping and crossing of the curve representing DNA with itself and, an arbitrary decision with respect to the choice of the direction for the four bases [8][9][10][11][12][13][14][15]18]. Based on previous studies, Liao et al. [8][9][10][11][12][13][14][15] presented many graphical representations. These methods have overcome the aforementioned limitations. In addition, the concept of geometrical center is introduced into these nonsingular methods, and various similarity/dissimilarity matrices are constructed for the comparison of the similarity/dissimilarity of DNA sequence. Zhang et al. [16,17] created the Z-curve to describe DNA sequences. It can excellently characterize DNA sequence and not cause any loss of biological information of the sequence. Inspired by the idea of the Z-curve, we propose a new 4D graphical representation of DNA sequences in this work. It can also represent the biological characteristics of DNA sequences nicely, and more importantly, it avoids loss of information in the transfer from a DNA sequence to its geometrical representation.