With an exponentially increasing amount of astronomical data, the complexity and dimension of astronomical data are likewise growing rapidly. Extracting information from such data becomes a critical and challenging problem. For example, some algorithms can only be employed in the low-dimensional spaces, so feature selection and feature extraction become important topics. Here we describe the difference between feature selection and feature extraction methods, and introduce the taxonomy of feature selection methods as well as the characteristics of each method. We present a case study comparing the performance and computational cost of different feature selection methods. For the filter method, ReliefF and fisher filter are adopted; for the wrapper method, improved CHAID, linear discriminant analysis (LDA), Naive Bayes (NB) and C4.5 are taken as learners. Applied on the sample, the result indicates that from the viewpoints of computational cost the filter method is superior to the wrapper method. Moreover, different learning algorithms combined with appropriate feature selection methods may arrive at better performance.
Targeting quasar candidates is always an important task for large spectroscopic sky survey projects. Astronomers never give up thinking out effective approaches to separate quasars from stars. The previous methods on this issue almost belong to supervised methods or color-color cut. In this work, we compare the performance of a supervised method -Support Vector Machine (SVM)-with that of an unsupervised method one-class SVM. The performance of SVM is better than that of one-class SVM. But one-class SVM is an unsupervised algorithm which is helpful to recognize rare or mysterious objects. Combining supervised methods with unsupervised methods is effective to improve the performance of a single classifier.
With the construction and development of ground-based and space-based observatories, astronomical data amount to Terascale, even Petascale. How to extract knowledge from so huge data volume by automated methods is a big challenge for astronomers. Under this situation, many researchers have studied various approaches and developed different softwares to solve this issue. According to the special task of data mining, we need to select an appropriate technique suiting the requirement of data characteristics. Moreover all algorithms have their own pros and cons. We introduce the characteristics of astronomical data, present the taxonomy of knowledge discovery, and describe the functionalities of knowledge discovery in detail. Then the methods of knowledge discovery are touched upon. Finally the successful applications of data mining techniques in astronomy are summarized and reviewed. Facing data avalanche in astronomy, knowledge discovery in databases (KDD) shows its superiority.
We investigate selection and weighting of features by applying a random forest algorithm to multiwavelength data. Then we employ a k-nearest neighbor method to distinguish quasars from stars. We then compare the performance of this approach based on all features, weighted features, and selected features. We find that the k-nearest neighbor approach combined with random forests effectively separates quasars from stars.The sample we used was cross-identified from different survey catalogs, i.e., the SDSS DR5, FIRST, and USNO-B1.0 catalogs. This yielded as sample of 6,479 quasars and 785 stars.A random forest approach was used to compute a weight for each attribute which allows us to select the most important attributes. We used the k-nearest neighbor approach to discriminate between quasars and stars and the results for three different variants are shown in Table 1. The accuracy for quasar selection is above 98% for all three variants, but the classification of stars is not as good. The overall accuracy is better than 89% and in the best case the total accuracy is 94.93%. Table 1 also shows that the performance with weighted features or selected features is slightly better than that with all features. As a consequence, if we have many input features, we generally need selection or weighting of features before we begin k-NN model building.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.