Hiromasa Arai scite author profile

Approximating a matrix by a small subset of its columns is a known problem in numerical linear algebra. Algorithms that address this problem have been used in areas which include, among others, sparse approximation, unsupervised feature selection, data mining, and knowledge representation. Such algorithms were investigated since the 1960's, with recent results that use randomization. The problem is believed to be NP-Hard, and to the best of our knowledge there are no previously published algorithms aimed at computing optimal solutions. We show how to model the problem as a graph search, and propose a heuristic based on eigenvalues of related matrices. Applying the A* search strategy with this heuristic is guaranteed to find the optimal solution. Experimental results on common datasets show that the proposed algorithm can effectively select columns from moderate size matrices, typically improving by orders of magnitude the run time of exhaustive search. We also show how to combine the proposed algorithm with other non-optimal (but much faster) algorithms in a ``two stage'' framework, which is guaranteed to improve the accuracy of the other algorithms.

show abstract

Unsupervised Feature Selection by Heuristic Search with Provable Bounds on Suboptimality

Arai

Maung

et al. 2016

AAAI

View full text Add to dashboard Cite

Identifying a small number of features that can represent the data is a known problem that comes up in areas such as machine learning, knowledge representation, data mining, and numerical linear algebra. Computing an optimal solution is believed to be NP-hard, and there is extensive work on approximation algorithms. Classic approaches exploit the algebraic structure of the underlying matrix, while more recent approaches use randomization. An entirely different approach that uses the A* heuristic search algorithm to find an optimal solution was recently proposed. Not surprisingly it is limited to effectively selecting only a small number of features. We propose a similar approach related to the Weighted A* algorithm. This gives algorithms that are not guaranteed to find an optimal solution but run much faster than the A* approach, enabling effective selection of many features from large datasets. We demonstrate experimentally that these new algorithms are more accurate than the current state-of-the-art while still being practical. Furthermore, they come with an adjustable guarantee on how different their error may be from the smallest possible (optimal) error. Their accuracy can always be increased at the expense of a longer running time.

show abstract

Weighted A* Algorithms for Unsupervised Feature Selection with Provable Bounds on Suboptimality

Arai

Maung

et al. 2016

AAAI

View full text Add to dashboard Cite

show abstract

Two-Stage Feature Selection with Unsupervised Second Stage

Arai

Maung

et al. 2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hiromasa Arai

Telop-on-demand: video structuring and retrieval based on text recognition

Optimal Column Subset Selection by A-Star Search

Unsupervised Feature Selection by Heuristic Search with Provable Bounds on Suboptimality

Weighted A* Algorithms for Unsupervised Feature Selection with Provable Bounds on Suboptimality

Two-Stage Feature Selection with Unsupervised Second Stage

Contact Info

Product

Resources

About