Abstract. Several approaches have been proposed for the analysis of DNA microarray datasets, focusing on the performance and robustness of the final feature subsets. The novelty of this paper arises in the use of q-values to pre-filter the features of a DNA microarray dataset identifying the most significant ones and including this information into a genetic algorithm for further feature selection. This method is applied to a lung cancer microarray dataset resulting in similar performance rates and greater robustness in terms of selected features (on average a 36.21% of robustness improvement) when compared to results of the standard algorithm.1 Scientific Background DNA microarray technology has been widely used for gene expression profiling and prediction of cancer. Analysis of such data involves facing a problem commonly referred to as the curse of dimensionality [9] where each sample is described by thousands of features (genes) with few samples -often fewer than a hundred -available. Several approaches have been proposed to identify relevant genes with good performance in classifying the disorder under investigation. However, these approaches lack a desirable feature when identifying gene expression profiles -robustness. A common feature of such methods is instability of results with high variability of identified features when repeated executions of the algorithm are made. To tackle this problem, recent works have proposed different methodologies that try to achieve robust feature subset selections with good performance rates in test data [7,10].Use of statistical tests with multiple features against some null hypothesis is common practice with the expectation that a proportion of such features would be incorrectly considered significant [8]. In such circumstances it is important to use some form of false discovery rate technique to either adjust the p-values [1] or use a different measure which takes into account false positives such as the q-value [8]. Use of such a measure allows focus to be placed on features which can be considered to satisfy a null hypothesis in further analysis. In the original paper [8] this methodology reduced the number of features identified in the Hedenfalk dataset from 605 to 162 within a total feature set of 3170.In this paper a modified t-test and q-values [8] are incorporated into a feature selection procedure similar to the genetic algorithm (GA) described in [7] with the purpose of identifying genes that are significant in differentiating lung cancer microarray expressions. In their approach, biological information from KEGG [5, 6] database was included into the GA resulting in more robust feature subsets with good performance