This paper proposes a feature selection method that is effective in distinguishing colorectal cancer patients from normal individuals using K-means clustering and the modified harmony search algorithm. As the genetic cause of colorectal cancer originates from mutations in genes, it is important to classify the presence or absence of colorectal cancer through gene information. The proposed methodology consists of four steps. First, the original data are Z-normalized by data preprocessing. Candidate genes are then selected using the Fisher score. Next, one representative gene is selected from each cluster after candidate genes are clustered using K-means clustering. Finally, feature selection is carried out using the modified harmony search algorithm. The gene combination created by feature selection is then applied to the classification model and verified using 5-fold cross-validation. The proposed model obtained a classification accuracy of up to 94.36%. Furthermore, on comparing the proposed method with other methods, we prove that the proposed method performs well in classifying colorectal cancer. Moreover, we believe that the proposed model can be applied not only to colorectal cancer but also to other gene-related diseases.
In this study, we propose a method to automatically find features from a dataset that are effective for classification or prediction, using a new method called multi-agent reinforcement learning and a guide agent. Each feature of the dataset has one of the main and guide agents, and these agents decide whether to select a feature. Main agents select the optimal features, and guide agents present the criteria for judging the main agents’ actions. After obtaining the main and guide rewards for the features selected by the agents, the main agent that behaves differently from the guide agent updates their Q-values by calculating the learning reward delivered to the main agents. The behavior comparison helps the main agent decide whether its own behavior is correct, without using other algorithms. After performing this process for each episode, the features are finally selected. The feature selection method proposed in this study uses multiple agents, reducing the number of actions each agent can perform and finding optimal features effectively and quickly. Finally, comparative experimental results on multiple datasets show that the proposed method can select effective features for classification and increase classification accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.