Genomic signal processing (GSP) is based on the use of digital signal processing methods for the analysis of genomic data. Convolutional neural networks (CNN) are the state-of-the-art machine learning classifiers that have been widely applied to solve complex problems successfully. In this paper, we present a deep learning architecture and a method for the classification of three different functional genome types: coding regions (CDS), long noncoding regions (LNC), and pseudogenes (PSD) in genomic data, based on the use of GSP methods to convert the nucleotide sequence into a graphical representation of the information contained in it. The obtained accuracy scores of 83% and 84% when classifying between CDS vs. LNC and CDS vs. PSD, respectively, indicate the feasibility of employing this methodology for the classification of these types of sequences. The model was not able to differentiate from PSD and LNC. Our results indicate the feasibility of employing CNN with GSP for the classification of these types of DNA data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.