Abstract-Using digital signal processing in genomic field is a key of solving most problems in this area such as prediction of gene locations in a genomic sequence and identifying the defect regions in DNA sequence. It is found that, using DSP is possible only if the symbol sequences are mapped into numbers. In literature many techniques have been developed for numerical representation of DNA sequences. They can be classified into two types, Fixed Mapping (FM) and Physico Chemical Property Based Mapping (PCPBM ( . The open question is that, which one of these numerical representation techniques is to be used? The answer to this question needs understanding these numerical representations considering the fact that each mapping depends on a particular application. This paper explains this answer and introduces comparison between these techniques in terms of their precision in exon and intron classification. Simulations are carried out using short sequences of the human genome (GRch37/hg19). The final results indicate that the classification performance is a function of the numerical representation method.
Human Genome Project has led to a huge inflow of genomic data. After the completion of human genome sequencing, more and more effort is being put into identification of splicing sites of exons and introns (donor and acceptor sites). These invite bioinformatics to analysis the genome sequences and identify the location of exon and intron boundaries or in other words prediction of splicing sites. Prediction of splice sites in genic regions of DNA sequence is one of the most challenging aspects of gene structure recognition. Over the last two decades, artificial neural networks gradually became one of the essential tools in bioinformatics. In this paper artificial neural networks with different numerical mapping techniques have been employed for building integrated model for splice site prediction in genes. An artificial neural network is trained and then used to find splice sites in human genes. A comparison between different mapping methods using trained neural network in terms of their precision in prediction of donor and acceptor sites will be presented in this paper. Training and measuring performance of neural network are carried out using sequences of the human genome (GRch37/hg19-chr21). Simulation results indicate that using Electron-Ion Interaction Potential numerical mapping method with neural network yields to the best performance in prediction.
Signals that represent information may be classified into two forms: numeric and symbolic. Symbolic signals such as DNA symbolic sequences cannot be directly processed with digital signal processing (DSP) techniques. The only way to apply DSP in genomic field is the mapping of DNA symbolic sequences to numerical sequences. Hence, biological properties are reflected in a numerical domain. This opens a field to present a set of tools for solving genomic problems. In literature many techniques have been developed for numerical representation of DNA sequences. The main drawback of these techniques is that each nucleotide is represented by a numerical value depending on nucleotide type only ignoring its position in codon and DNA sequence. In this paper a new approach for DNA symbolic to numeric representation called Circular Mapping (CM) is introduced. It's based on graphical representation of DNA sequence that maps each nucleotide by a complex numerical value depending not only on nucleotide type but also on its position in codons. The main applications of this method are the gene prediction that aims to locate the protein-coding regions and the classification of exons and introns in DNA sequences. The proposed approach showed significant improvement in exons and introns classification as compared with the existing techniques. The efficiency of this method in classification depends on the right choice of the mapping angle () as indicated by the power spectral analysis results over the sequences of the human genome (GRch37/hg19).
Lung cancer is an insidious disease, producing no symptoms until the disease spreads widely in the human body. Mutations of genes are the first alarm of such a disease in the human body. Therefore, classifying these mutations could provide guidance for the treatment decisions for lung cancer. In this Letter, a novel accumulated grey-level image (AGLI) method for gene representation is introduced, where each base in gene sequence is represented by accumulated number based on its order in gene sequence and then reflected into image domain. AGLI is incorporated with 2D principle component analysis to build accurate and low-dimensional algorithm for classifying the genetic mutations. Proposed algorithm was applied on the top 10 effective genes in lung cancer, where an accuracy of 99.27% was achieved. Experimental results show that the proposed algorithm enhanced the accuracy of classification and reduced the classification time for mutation in lung cancer relative to the existing methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.