The prediction of protein coding regions in DNA sequences is an important problem in computational biology. It is observed that nucleotides in the protein coding regions or exons of a DNA sequence show period-3 property. Hence identification of the period-3 regions helps in predicting the gene locations within the billions long DNA sequence of eukaryotic cells. The period-3 property exhibited in exons of eukaryotic gene sequences enables signal processing based time-domain and frequency domain methods to predict these regions efficiently. Several approaches based on signal processing tools have, therefore, been applied to this problem, to predict these regions effectively. This paper describes novel and efficient comb filter-based techniques for the prediction of protein coding region based on the period-3 behavior of codon sequences. The proposed method is then validated on Burset/Guigo1996, HMR195 and KEGG standard datasets using various prediction measures. It is shown that cascaded differentiator comb (CDC) filter can be used for prediction of protein coding region with better prediction efficiency, and involves less computational complexity compared with the other signal processing techniques based on period-3 property
Discrimination of protein coding regions called exons from noncoding regions called introns or junk DNA in eukaryotic cell is a computationally intensive task. But the dimension of the DNA string is huge; hence it requires large computation time. Further the DNA sequences are inherently random and have vast redundancy, hidden regularities, long repeats and complementary palindromes and therefore cannot be compressed efficiently. The objective of this study is to present an integrated signal processing algorithm that considerably reduces the computational load by compressing the DNA sequence effectively and aids the problem of searching for coding regions in DNA sequences. The presented algorithm is based on the Discrete Wavelet Transform (DWT), a very fast and effective method used for data compression and followed by comb filter for effective prediction of protein coding period-3 regions in DNA sequences. This algorithm is validated using standard dataset such as HMR195, Burset and Guigo and KEGG.
Development of efficient gene prediction algorithms is one of the fundamental efforts in gene prediction study in the area of genomics. In genomic signal processing the basic step of the identification of protein coding regions in DNA sequences is based on the period-3 property exhibited by nucleotides in exons. Several approaches based on signal processing tools and numerical representations have been applied to solve this problem, trying to achieve more accurate predictions. This paper presents a new indicator sequence based on amino acid sequence, called as aminoacid indicator sequence, derived from DNA string that uses the existing signal processing based timedomain and frequency domain methods to predict these regions within the billions long DNA sequence of eukaryotic cells which reduces the computational load by one-third. It is known that each triplet of bases, called as codon, instructs the cell machinery to synthesize an amino acid. The codon sequence therefore uniquely identifies an amino acid sequence which defines a protein. Thus the protein coding region is attributed by the codons in amino acid sequence. This property is used for detection of period-3 regions using amino acid sequence. Physico-chemical properties of amino acids are used for numerical representation. Various accuracy measures such as exonic peaks, discriminating factor, sensitivity, specificity, miss rate, wrong rate and approximate correlation are used to demonstrate the efficacy of the proposed predictor. The proposed method is validated on various organisms using the standard dataset HMR195, Burset and Guigo and KEGG. The simulation result shows that the proposed method is an effective approach for protein coding prediction.
Correlation between gene expression profiles to disease or different developmental stages of a cell through microarray data and its analysis has been a great deal in molecular biology. As the microarray data have thousands of genes and very few sample, thus efficient feature extraction and computational method development is necessary for the analysis. In this paper we have proposed an effective feature extraction method based on factor analysis (FA) with discrete wavelet transform (DWT) to detect informative genes. Radial basis function neural network (RBFNN) classifier is used to efficiently predict the sample class which has a low complexity than other classifier. The potential of the proposed approach is evaluated through an exhaustive study by many benchmark datasets. The experimental results show that the proposed method can be a useful approach for cancer classification.
Microarray data is inherently noisy due to the noise contaminated from various sources during the preparation of mi-
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.