The topic of recovery of a structured model given a small number of linear observations has been well-studied in recent years. Examples include recovering sparse or group-sparse vectors, low-rank matrices, and the sum of sparse and low-rank matrices, among others. In various applications in signal processing and machine learning, the model of interest is known to be structured in several ways at the same time, for example, a matrix that is simultaneously sparse and low-rank. An important application is the sparse phase retrieval problem, where the goal is to recover a sparse signal from phaseless measurements. In machine learning, the problem comes up when combining several regularizers that each promote a certain desired structure.Often penalties (norms) that promote each individual structure are known and yield an order-wise optimal number of measurements (e.g., ℓ 1 norm for sparsity, nuclear norm for matrix rank), so it is reasonable to minimize a combination of such norms. We show that, surprisingly, if we use multi-objective optimization with the individual norms, then we can do no better, orderwise, than an algorithm that exploits only one of the several structures. This result suggests that to fully exploit the multiple structures, we need an entirely new convex relaxation, i.e., not one that is a function of the convex relaxations used for each structure. We then specialize our results to the case of sparse and low-rank matrices. We show that a nonconvex formulation of the problem can recover the model from very few measurements, on the order of the degrees of freedom of the matrix, whereas the convex problem obtained from a combination of the ℓ 1 and nuclear norms requires many more measurements. This proves an order-wise gap between the performance of the convex and nonconvex recovery problems in this case.
We consider the problem of learning a realization for a linear time-invariant (LTI) dynamical system from input/output data. Given a single input/output trajectory, we provide finite time analysis for learning the system's Markov parameters, from which a balanced realization is obtained using the classical Ho-Kalman algorithm. By proving a stability result for the Ho-Kalman algorithm and combining it with the sample complexity results for Markov parameters, we show how much data is needed to learn a balanced realization of the system up to a desired accuracy with high probability. ‡ Version 2 has two improvements: First, paper now uses spectral radius rather than largest singular value hence applies to a larger class of systems. Secondly, new sample complexity bounds are provided for approximating the system's Hankel operator via estimated Markov parameters. These bounds leverage stability and treat the system as if it has a logarithmic order.Hence, if we had access to infinitely many independent (y t , u t−k ) pairs, our task could be accomplished by a simple averaging. In this work, we will show that, one can robustly learn these matrices from a small amount of data generated from a single realization of the system trajectory. The challenge is efficiently using finite and dependent data points to perform reliable estimation. Observe that, our problem is identical to learning the concatenated matrix G defined asNext section describes our input and output data. Based on this, we formulate a least-squares procedure that estimates G. The estimateĜ will play a critical role in the identification of the system matrices.1 Balanced realizations give a representation of the system in a basis that orders the states in terms of their effect on the input/output behavior. This is relevant for determining the system order and for model reduction [23]. 2 While we assume diagonal covariance throughout the paper, we believe our proof strategy can be adapted to arbitrary covariance matrices.
Many modern neural network architectures are trained in an overparameterized regime where the parameters of the model exceed the size of the training dataset. Sufficiently overparameterized neural network architectures in principle have the capacity to fit any set of labels including random noise. However, given the highly nonconvex nature of the training landscape it is not clear what level and kind of overparameterization is required for first order methods to converge to a global optima that perfectly interpolate any labels. A number of recent theoretical works have shown that for very wide neural networks where the number of hidden units is polynomially large in the size of the training data gradient descent starting from a random initialization does indeed converge to a global optima. However, in practice much more moderate levels of overparameterization seems to be sufficient and in many cases overparameterized models seem to perfectly interpolate the training data as soon as the number of parameters exceed the size of the training data by a constant factor. Thus there is a huge gap between the existing theoretical literature and practical experiments. In this paper we take a step towards closing this gap. Focusing on shallow neural nets and smooth activations, we show that (stochastic) gradient descent when initialized at random converges at a geometric rate to a nearby global optima as soon as the square-root of the number of network parameters exceeds the size of the training data. Our results also benefit from a fast convergence rate and continue to hold for non-differentiable activations such as Rectified Linear Units (ReLUs).
The problem of signal recovery from the autocorrelation, or equivalently, the magnitudes of the Fourier transform, is of paramount importance in various fields of engineering. In this work, for one-dimensional signals, we give conditions, which when satisfied, allow unique recovery from the autocorrelation with very high probability. In particular, for sparse signals, we develop two non-iterative recovery algorithms. One of them is based on combinatorial analysis, which we prove can recover signals upto sparsity o(n 1/3 ) with very high probability, and the other is developed using a convex optimization based framework, which numerical simulations suggest can recover signals upto sparsity o(n 1/2 ) with very high probability.
We consider the problem of estimating an unknown but structured signal x0 from its noisy linear observations y = Ax0 + z ∈ R m . To the structure of x0 is associated a structure inducing convex function f (·). We assume that the entries of A are i.i.d. standard normal N (0, 1) and z ∼ N (0, σ 2 Im). As a measure of performance of an estimate x * of x0 we consider the "Normalized Square Error" (NSE) x * − x0 2 2 /σ 2 . For sufficiently small σ, we characterize the exact performance of two different versions of the well known LASSO algorithm. The first estimator is obtained by solving the problem arg minx y − Ax 2 + λf (x). As a function of λ, we identify three distinct regions of operation. Out of them, we argue that "RON " is the most interesting one. When λ ∈ RON , we show that the NSE isis the expected squared-distance of an i.i.d. standard normal vector to the dilated subdifferential λ · ∂f (x0). Secondly, we consider the more popular estimator arg minx 1 2 y − Ax 2 2 + στ f (x). We propose a formula for the NSE of this estimator by establishing a suitable mapping between this and the previous estimator over the region RON . As a useful side result, we find explicit formulae for the optimal estimation performance and the optimal penalty parameters λ * and τ * .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.