The ℓ p regression problem takes as input a matrix A ∈ R n×d , a vector b ∈ R n , and a number p ∈ [1, ∞), and it returns as output a number Z and a vector x OPT ∈ R d such that Z = min x∈R d Ax − b p = Ax OPT − b p . In this paper, we construct coresets and obtain an efficient two-stage sampling-based approximation algorithm for the very overconstrained (n ≫ d) version of this classical problem, for all p ∈ [1, ∞). The first stage of our algorithm non-uniformly samplesr 1 = O(36 p d max{p/2+1,p}+1 ) rows of A and the corresponding elements of b, and then it solves the ℓ p regression problem on the sample; we prove this is an 8-approximation. The second stage of our algorithm uses the output of the first stage to resampler 1 /ǫ 2 constraints, and then it solves the ℓ p regression problem on the new sample; we prove this is a (1+ǫ)-approximation. Our algorithm unifies, improves upon, and extends the existing algorithms for special cases of ℓ p regression, namely p = 1, 2 [11,13]. In course of proving our result, we develop two concepts-well-conditioned bases and subspace-preserving sampling-that are of independent interest.
This paper addresses the problem of finding a B-term wavelet representation of a given discrete function ƒ ∈ R n whose distance from ƒ is minimized. The problem is well understood when we seek to minimize the Euclidean distance between ƒ and its representation. The first-known algorithms for finding provably approximate representations minimizing general l p distances (including l∞) under a wide variety of compactly supported wavelet bases are presented in this paper. For the Haar basis, a polynomial time approximation scheme is demonstrated. These algorithms are applicable in the one-pass sublinear-space data stream model of computation. They generalize naturally to multiple dimensions and weighted norms. A universal representation that provides a provable approximation guarantee under all p-norms simultaneously; and the first approximation algorithms for bit-budget versions of the problem, known as adaptive quantization, are also presented. Further, it is shown that the algorithms presented here can be used to select a basis from a treestructured dictionary of bases and find a B-term representation of the given function that provably approximates its best dictionary-basis representation. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it. Abstract-This paper addresses the problem of finding a B-term wavelet representation of a given discrete function f 2 R n whose distance from f is minimized. The problem is well understood when we seek to minimize the Euclidean distance between f and its representation. The first-known algorithms for finding provably approximate representations minimizing general`p distances (including`1) under a wide variety of compactly supported wavelet bases are presented in this paper. For the Haar basis, a polynomial time approximation scheme is demonstrated. These algorithms are applicable in the one-pass sublinear-space data stream model of computation. They generalize naturally to multiple dimensions and weighted norms. A universal representation that provides a provable approximation guarantee under all p-norms simultaneously; and the first approximation algorithms for bit-budget versions of the problem, known as adaptive quantization, are also presented. Further, it is shown that the algorithms presented here can be used to select a basis from a tree-structured dictionary of bases and find a B-term representation of the given function that provably approximates its best dictionary-basis representation.
Feature selection that aims to determine and select the distinctive terms representing a best document is one of the most important steps of classification. With the feature selection, dimension of document vectors are reduced and consequently duration of the process is shortened. In this study, feature selection methods were studied in terms of dimension reduction rates, classification success rates, and dimension reduction-classification success relation. As classifiers, kNN (k-Nearest Neighbors) and SVM (Support Vector Machines) were used. 5 standard (Odds Ratio-OR, Mutual Information-MI, Information Gain-IG, Chi-Square-CHI and Document Frequency-DF), 2 combined (Union of Feature Selections-UFS and Correlation of Union of Feature Selections-CUFS) and 1 new (Sum of Term Frequency-STF) feature selection methods were tested. The application was performed by selecting 100 to 1000 terms (with an increment of 100 terms) from each class. It was seen that kNN produces much better results than SVM. STF was found out to be the most successful feature selection considering the average values in both datasets. It was also found out that CUFS, a combined model, is the one that reduces the dimension the most, accordingly, it was seen that CUFS classify the documents more successfully with less terms and in short period compared to many of the standard methods.
This paper addresses the problem of finding a B-term wavelet representation of a given discrete function ƒ ∈ R n whose distance from ƒ is minimized. The problem is well understood when we seek to minimize the Euclidean distance between ƒ and its representation. The first-known algorithms for finding provably approximate representations minimizing general l p distances (including l∞) under a wide variety of compactly supported wavelet bases are presented in this paper. For the Haar basis, a polynomial time approximation scheme is demonstrated. These algorithms are applicable in the one-pass sublinear-space data stream model of computation. They generalize naturally to multiple dimensions and weighted norms. A universal representation that provides a provable approximation guarantee under all p-norms simultaneously; and the first approximation algorithms for bit-budget versions of the problem, known as adaptive quantization, are also presented. Further, it is shown that the algorithms presented here can be used to select a basis from a treestructured dictionary of bases and find a B-term representation of the given function that provably approximates its best dictionary-basis representation. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it. Abstract-This paper addresses the problem of finding a B-term wavelet representation of a given discrete function f 2 R n whose distance from f is minimized. The problem is well understood when we seek to minimize the Euclidean distance between f and its representation. The first-known algorithms for finding provably approximate representations minimizing general`p distances (including`1) under a wide variety of compactly supported wavelet bases are presented in this paper. For the Haar basis, a polynomial time approximation scheme is demonstrated. These algorithms are applicable in the one-pass sublinear-space data stream model of computation. They generalize naturally to multiple dimensions and weighted norms. A universal representation that provides a provable approximation guarantee under all p-norms simultaneously; and the first approximation algorithms for bit-budget versions of the problem, known as adaptive quantization, are also presented. Further, it is shown that the algorithms presented here can be used to select a basis from a tree-structured dictionary of bases and find a B-term representation of the given function that provably approximates its best dictionary-basis representation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.