The almost complete sequencing of the human genome [1, 2], as well as the public access to most of its content [3, 4], offer the opportunity to explore in depth its content and to data mine this unique information depository. The standard approach of representing the genomic information by sequences of nucleotide symbols in the strands of DNA and RNA molecules, by symbolic codons (triplets of nucleotides), or by symbolic sequences of amino acids in the corresponding polypeptide chains (for the genes) limits the methodology of handling the genomic information to mere pattern matching or statistical procedures. Using a base 4 real representation or an equivalent complex dual binary representation of the nucleotides, allows converting the DNA sequences into digital genomic signals and offers the possibility to apply a wealth of powerful signal processing methods for their analysis. Currently, only about 32000 genes containing the Conversion of nucleotides sequences into genomic signals P. D. Cristea * Bio-Medical Engineering Center, "Politehnica" University of Bucharest, Romania Received: March 11, 2002; Accepted: April 29, 2002 Abstract An original tetrahedral representation of the Genetic Code (GC) that better describes its structure, degeneration and evolution trends is defined. The possibility to reduce the dimension of the representation by projecting the GC tetrahedron on an adequately oriented plane is also analyzed, leading to some equivalent complex representations of the GC. On these bases, optimal symbolic-to-digital mappings of the linear, nucleic acid strands into real or complex genomic signals are derived at nucleotide, codon and amino acid levels. By converting the sequences of nucleotides and polypeptides into digital genomic signals, this approach offers the possibility to use a large variety of signal processing methods for their handling and analysis. It is also shown that some essential features of the nucleotide sequences can be better extracted using this representation. Specifically, the paper reports for the first time the existence of a global helicoidal wrapping of the complex representations of the bases along DNA sequences, a large scale trend of genomic signals. New tools for genomic signal analysis, including the use of phase, aggregated phase, unwrapped phase, sequence path, stem representation of components' relative frequencies, as well as analysis of the transitions are introduced at the nucleotide, codon and amino acid levels, and in a multiresolution approach.
An original tetrahedral representation of the Genetic Code (GC), that better catches its structure, degeneracy and evolution trends, is defined. The possibility to reduce the dimensionality of the description by the projection of the GC tetrahedron on an adequately oriented plane is also considered, leading to complex representations of the GC. On these bases, optimal symbolic-to-digital mappings of the linear, one-dimensional and one-directional strands of nucleic acids into real or complex genetic signals are derived at nucleotide, codon and amino acid levels. By converting the sequences of nucleotides and polypeptides into digital genetic signals, this approach opens the possibility to use a large variety of signal processing methods for their processing and analysis. It is also shown that some essential features of nucleotide sequences can be better extracted using this representation. Some preliminary results in the comparative analysis of the statistical properties of intragenic vs. intergenic genetic signals are also presented. The use of Independent Component Analysis (ICA) to search for control sequences in the intergenic DNA, i.e., the part of the genome that does not encode proteins, is suggested.
Abstract-Perfect reconstruction, quality scalability, and region-of-interest coding are basic features needed for the image compression schemes used in telemedicine applications. This paper proposes a new wavelet-based embedded compression technique that efficiently exploits the intraband dependencies and uses a quadtree-based approach to encode the significance maps. The algorithm produces a losslessly compressed embedded data stream, supports quality scalability, and permits regionof-interest coding. Moreover, experimental results obtained on various images show that the proposed algorithm provides competitive lossless/lossy compression results. The proposed technique is well suited for telemedicine applications that require fast interactive handling of large image sets, over networks with limited and/or variable bandwidth.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.