SUMMARYWe propose a multi-label feature selection method that considers feature dependencies. The proposed method circumvents the prohibitive computations by using a low-rank approximation method. The empirical results acquired by applying the proposed method to several multilabel datasets demonstrate that its performance is comparable to those of recent multi-label feature selection methods and that it reduces the computation time. key words: multi-label feature selection, multivariate feature selection, feature dependency, Nyström method
IntroudctionRecently, with the advancement of multi-label data analysis related to modern applications that involve multiple concepts [1], knowledge-mining research has provided information that is vital to achieve the distinct objectives of these applications. Such applications include conventional text categorization [2], image annotation, sentiment analysis for brand and social network service such as Twitter [3].Large numbers of features degrades the speeds of machine learning algorithms, the generality of knowledge, and the interpretability of the explored models [4]. Multi-label feature selection is considered a solution that can effectively avoid the aforementioned problems [5], [6]. Conventional multi-label feature selection methods evaluate the importance of each feature independently; therefore, the dependencies among features are ignored [2]. As a result, a compact multi-label feature subset cannot be obtained because a selected feature subset will necessarily contain redundant features, that is, features that are similar to one another [6]. To resolve this practical problem, a multi-label feature selection method must consider the feature dependencies during its feature selection process. However, these methods typically require additional computation to evaluate the feature dependencies.Recently, multi-label quadratic programming feature selection (MLQPFS) was introduced by Lim et al. [7]. It has the advantage that it concurrently considers the dependencies between the features and labels and among the features by using a quadratic function without a special search algorithm. However, although this method has this advantage, it still requires additional computational time O(N 2 ) (N is the number of features) to determine the feature dependencies.In this paper, we propose a fast multi-label feature selection method that considers the feature dependencies. To develop this method, we extended the MLQPFS method and endeavored to reduce the computational requirements involved in determining the feature dependencies by using a low-rank approximation. We decreased the time required for feature dependency determination from O(N 2 ) to O(Nk) (k is the selected number from N features and is much smaller than N) by using the MLQPFS method.