Background: One of the challenges in the bioinformatics field is the characterization of genetic diseases, more precisely of the anomalies of the genetic code that lead to the onset of various pathologies. Concerning leukemia, there exist different types, such as acute and chronic leukemia. The acute ones are Acute Lymphoblastic Leukemia (ALL) and Acute Myeloid Leukemia (AML). This paper considers a dataset of patients belonging to two distinct classes: ALL and AML. The aim is to define a feature selection analysis process mainly based on Deep Learning for both classifying the leukemia of patients as ALL or AML and identifying the list of differential expressed genes.
Method:The analyzed data are extracted from dual-channel microarray experiments from the Gene Express Omnibus (GEO) platform, a public database available on the NCBI website containing genomic data, which represent the methylation values for each gene of each sample. The analysis exploits feature selection techniques aimed at reducing the consistent number of variables (genes). To this aim, we use linear models for differential expression for microarray data, and an autoencoder based unsupervised deep learning model to simplify and speed up the classification.Results: Following the reduction in the number of variables, classification models have been implemented with the use of a deep neural network (DNN), obtaining a classification accuracy of approximately 92%. Then, the results have been compared with the ones provided by an approach based on support vector machines (SVM) giving an accuracy of 87,39%. Moreover, another feature selection approach based on genetic algorithms has been experimented obtaining 60,36% (DNN) and 30,63% (SVM) of accuracy.Conclusions: For further verification of the relevance of the selected set of genes, we conducted a gene enrichment analysis based on the functional annotation of the differentially expressed genes. As a result, a differentially expressed pathway between the two pathologies has been detected.