Malaysian medicinal plants may be abundant natural resources but there has not been much research done on preserving the knowledge of these medicinal plants which enables general public to know the leaf using computing capability. Therefore, in this preliminary study, a novel framework in order to identify and classify tropical medicinal plants in Malaysia based on the extracted patterns from the leaf is presented. The extracted patterns from medicinal plant leaf are obtained based on several angle features. However, the extracted features create quite large number of attributes (features), thus degrade the performance most of the classifiers. Thus, a feature selection is applied to leaf data and to investigate whether the performance of a classifier can be improved. Wrapper based genetic algorithm (GA) feature selection is used to select the features and the ensemble classifier called Direct Ensemble Classifier for Imbalanced Multiclass Learning (DECIML) is used as a classifier. The performance of the feature selection is compared with two feature selections from Weka. In the experiment, five species of Malaysian medicinal plants are identified and classified in which will be represented by using 65 images. This study is important in order to assist local community to utilize the knowledge and application of Malaysian medicinal plants for future generation.
Ensemble learning by combining several single classifiers or another ensemble classifier is one of the procedures to solve the imbalance problem in multiclass data. However, this approach still faces the question of how the ensemble methods obtain their higher performance. In this paper, an investigation was carried out on the design of the meta classifier ensemble with sampling and feature selection for multiclass imbalanced data. The specific objectives were: 1) to improve the ensemble classifier through data-level approach (sampling and feature selection); 2) to perform experiments on sampling, feature selection, and ensemble classifier model; and 3 ) to evaluate t he performance of the ensemble classifier. To fulfil the objectives, a preliminary data collection of Malaysian plants’ leaf images was prepared and experimented, and the results were compared. The ensemble design was also tested with three other high imbalance ratio benchmark data. It was found that the design using sampling, feature selection, and ensemble classifier method via AdaboostM1 with random forest (also an ensemble classifier) provided improved performance throughout the investigation. The result of this study is important to the on-going problem of multiclass imbalance where specific structure and its performance can be improved in terms of processing time and accuracy.
Feature selection for data mining optimization receives quite a high demand especially on high-dimensional feature vectors of a data. Feature selection is a method used to select the best feature (or combination of features) for the data in order to achieve similar or better classification rate. Currently, there are three types of feature selection methods: filter, wrapper and embedded. This paper describes a genetic based wrapper approach that optimizes feature selection process embedded in a classification technique called a supervised Nearest Neighbour Distance Matrix (NNDM). This method is implemented and tested on several datasets obtained from the UCI Machine Learning Repository and other datasets. The results demonstrate a significant impact on the predictive accuracy for feature selection combined with the supervised NNDM in classifying new instances. Therefore it can be used in other applications that require feature dimension reduction such as image and bioinformatics classifications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.