Feature selection based on the fuzzy neighborhood rough set model (FNRS) is highly popular in data mining. However, the dependent function of FNRS only considers the information present in the lower approximation of the decision while ignoring the information present in the upper approximation of the decision. This construction method may lead to the loss of some information. To solve this problem, this paper proposes a fuzzy neighborhood joint entropy model based on fuzzy neighborhood self-information measure (FNSIJE) and applies it to feature selection. First, to construct four uncertain fuzzy neighborhood self-information measures of decision variables, the concept of self-information is introduced into the upper and lower approximations of FNRS from the algebra view. The relationships between these measures and their properties are discussed in detail. It is found that the fourth measure, named tolerance fuzzy neighborhood self-information, has better classification performance. Second, an uncertainty measure based on the fuzzy neighborhood joint entropy has been proposed from the information view. Inspired by both algebra and information views, the FNSIJE is proposed. Third, the K–S test is used to delete features with weak distinguishing performance, which reduces the dimensionality of high-dimensional gene datasets, thereby reducing the complexity of high-dimensional gene datasets, and then, a forward feature selection algorithm is provided. Experimental results show that compared with related methods, the presented model can select less important features and have a higher classification accuracy.
Feature selection is one of the core contents of rough set theory and application. Since the reduction ability and classification performance of many feature selection algorithms based on rough set theory and its extensions are not ideal, this paper proposes a feature selection algorithm that combines the information theory view and algebraic view in the neighborhood decision system. First, the neighborhood relationship in the neighborhood rough set model is used to retain the classification information of continuous data, to study some uncertainty measures of neighborhood information entropy. Second, to fully reflect the decision ability and classification performance of the neighborhood system, the neighborhood credibility and neighborhood coverage are defined and introduced into the neighborhood joint entropy. Third, a feature selection algorithm based on neighborhood joint entropy is designed, which improves the disadvantage that most feature selection algorithms only consider information theory definition or algebraic definition. Finally, experiments and statistical analyses on nine data sets prove that the algorithm can effectively select the optimal feature subset, and the selection result can maintain or improve the classification performance of the data set.
Abstract:Pistacia chinensis seed oil is proposed as a promising non-edible feedstock for biodiesel production. Different extraction methods were tested and compared to obtain crude oil from the seed of Pistacia chinensis, along with various deacidification measures of refined oil. The biodiesel was produced through catalysis of sodium hydroxide (NaOH) and potassium hydroxide (KOH). The results showed that the acid value of Pistacia chinensis oil was successfully reduced to 0.23 mg KOH/g when it was extracted using ethanol. Consequently, the biodiesel product gave a high yield beyond 96.0%. The transesterification catalysed by KOH was also more complete. Fourier transform infrared (FTIR) spectroscopy was used to monitor the transesterification reaction. Analyses by gas chromatography-mass spectrometry (GC-MS) and gas chromatography with a flame ionisation detector (GC-FID) certified that the Pistacia chinensis biodiesel mainly consisted of C 18 fatty acid methyl esters (81.07%) with a high percentage of methyl oleate. Furthermore, the measured fuel properties of the biodiesel met the required standards for fuel use. In conclusion, the Pistacia chinensis biodiesel is a qualified and feasible substitute for fossil diesel.
For incomplete datasets with mixed numerical and symbolic features, feature selection based on neighborhood multi-granulation rough sets (NMRS) is developing rapidly. However, its evaluation function only considers the information contained in the lower approximation of the neighborhood decision, which easily leads to the loss of some information. To solve this problem, we construct a novel NMRS-based uncertain measure for feature selection, named neighborhood multi-granulation self-information-based pessimistic neighborhood multi-granulation tolerance joint entropy (PTSIJE), which can be used to incomplete neighborhood decision systems. First, from the algebra view, four kinds of neighborhood multi-granulation self-information measures of decision variables are proposed by using the upper and lower approximations of NMRS. We discuss the related properties, and find the fourth measure-lenient neighborhood multi-granulation self-information measure (NMSI) has better classification performance. Then, inspired by the algebra and information views simultaneously, a feature selection method based on PTSIJE is proposed. Finally, the Fisher score method is used to delete uncorrelated features to reduce the computational complexity for high-dimensional gene datasets, and a heuristic feature selection algorithm is raised to improve classification performance for mixed and incomplete datasets. Experimental results on 11 datasets show that our method selects fewer features and has higher classification accuracy than related methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.