Feature selection is an important way to optimize the efficiency and accuracy of classifiers. However, traditional feature selection methods cannot work with many kinds of data in the real world, such as multi-label data. To overcome this challenge, multi-label feature selection is developed. Multi-label feature selection plays an irreplaceable role in pattern recognition and data mining. This process can improve the efficiency and accuracy of multi-label classification. However, traditional multi-label feature selection based on mutual information does not fully consider the effect of redundancy among labels. The deficiency may lead to repeated computing of mutual information and leave room to enhance the accuracy of multi-label feature selection. To deal with this challenge, this paper proposed a multi-label feature selection based on conditional mutual information among labels (CRMIL). Firstly, we analyze how to reduce the redundancy among features based on existing papers. Secondly, we propose a new approach to diminish the redundancy among labels. This method takes label sets as conditions to calculate the relevance between features and labels. This approach can weaken the impact of the redundancy among labels on feature selection results. Finally, we analyze this algorithm and balance the effects of relevance and redundancy on the evaluation function. For testing CRMIL, we compare it with the other eight multi-label feature selection algorithms on ten datasets and use four evaluation criteria to examine the results. Experimental results illustrate that CRMIL performs better than other existing algorithms.
Feature selection has become a vital issue in data mining and machine learning. But some challenges have been outstanding when trying to improve the performance of feature selection, such as small sample, uncertain classes, complex features, complementation and redundancy between each feature. In this paper, firstly the background of feature selection is introduced. Then we have presented a new perspective to analyze multi-label feature selection and provided typical papers on different classifications. To further analyze these algorithms, evaluation criterion on results of multi-label feature selection is summarized. Finally, some reflects on research directions, future works and conclusions are organized.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.