Cancer research is a challenging and competitive field. The study of gene expression data has enabled the discovery of unknown types of cancer using unsupervised learning. However, genomic sequence data are increasing in an exponential manner. Indeed, since 2011 the global annual sequencing capacity is estimated to be quadrillions of bases and counting. To cope with this issue, we propose, in this paper, the implementation of differential evolution clustering algorithm using MapReduce methodology in order to deal with big data. The proposed algorithm consists in three consecutive levels. Experiments were conducted on 18 real gene expression data sets. The obtained results have shown that our approach is effective and competes with existing algorithms.
Gene regulatory network (GRN) inference is a challenging problem that lends itself to a learning task. Both positive and negative examples are needed to perform supervised and semi-supervised learning. However, GRN datasets include only positive examples and/or unlabeled ones. Recently a growing interest is being devoted to the generation of negative examples from unlabeled data. Within this context, the authors propose to generate potential negative examples from the set of unlabeled ones and keep those that lead to the best classification accuracy when used with positive examples. A new proposed genetic algorithm for fixed-size subset selection has been combined with a support vector machine model for this purpose. The authors assessed the performance of the proposed approach using simulated and experimental datasets. Using simulated datasets, the proposed approach outperforms the other methods in most cases and improves the performance metrics when using balanced data. Experimental datasets show that the proposed approach allows finding the optimal solution for each transcription factor in this study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.