Low-cost scalable discretization, prediction, and feature selection for complex systems

Gerber, Susanne; Pospíšil, Lukáš; Navandar, Mohit; Horenko, Illia

doi:10.1126/sciadv.aaw0961

Cited by 25 publications

(36 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similar generalizations can also be constructed for convex decompositions of the data. For instance, by virtue of the decomposability of the least squares cost function, it is possible to construct a joint convex discretization of instantaneous and lagged values of the variables of interest, which forms the basis of the scalable probabilistic approximation method (Gerber et al, 2020). In this approach, the decomposition may be parameterized in terms of a transition matrix relating the weights at different times, thus naturally incorporating a temporal constraint into the discretization.…”

Section: Discussionmentioning

confidence: 99%

“…7. The least restricted version (Lee and Seung, 1997;Gerber et al, 2020) requires only that the reconstruction lies in the convex hull of the basis, or, in other words, that the weights Z satisfy the constraints…”

Section: Matrix Factorizationsmentioning

confidence: 99%

“…2. In addition to providing a consistent framework for defining each method, it is straightforward to incorporate additional constraints or penalties, e.g., for the purposes of feature selection or to induce sparsity (Jolliffe et al, 2003;Lee et al, 2007;Mairal et al, 2009;Witten et al, 2009;Jenatton et al, 2010;Gerber et al, 2020). Solving the resulting (usually constrained) optimization problem amounts to learning a dictionary with which to represent the data, with different methods producing different dictionaries.…”

Section: Introductionmentioning

confidence: 99%

“…By carefully defining the optimization problem, the learned dictionary can be tuned to target particular features in the data. Below, we demonstrate this process by utilizing a recently introduced regularized convex coding (Gerber et al, 2020), which allows for feature selection to be performed by varying a regularization parameter. By tuning the imposed regularization to optimize the reconstruction or prediction error, the relative performance of selecting a basis lying on or outside the convex hull can be compared to one that preferentially extracts cluster means.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Applications of matrix factorization methods to climate data

Harries

O’Kane

2020

Nonlin. Processes Geophys.

View full text Add to dashboard Cite

Abstract. An initial dimension reduction forms an integral part of many analyses in climate science. Different methods yield low-dimensional representations that are based on differing aspects of the data. Depending on the features of the data that are relevant for a given study, certain methods may be more suitable than others, for instance yielding bases that can be more easily identified with physically meaningful modes. To illustrate the distinction between particular methods and identify circumstances in which a given method might be preferred, in this paper we present a set of case studies comparing the results obtained using the traditional approaches of empirical orthogonal function analysis and k-means clustering with the more recently introduced methods such as archetypal analysis and convex coding. For data such as global sea surface temperature anomalies, in which there is a clear, dominant mode of variability, all of the methods considered yield rather similar bases with which to represent the data while differing in reconstruction accuracy for a given basis size. However, in the absence of such a clear scale separation, as in the case of daily geopotential height anomalies, the extracted bases differ much more significantly between the methods. We highlight the importance in such cases of carefully considering the relevant features of interest and of choosing the method that best targets precisely those features so as to obtain more easily interpretable results.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Matrix Factorizationsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Applications of matrix factorization methods to climate data

Harries

O’Kane

2020

Nonlin. Processes Geophys.

View full text Add to dashboard Cite

show abstract

“…Most notably, the major difference that exists between modeling proteins and their binding sites on a quantum computer versus on any classical machine is the difference in complexity classes available to solve within. Looking at the most recently well-cited model for statistical modeling and classifications of proteins by Dr. Susanne Gerber at the University of Mainz [21], the highest degree of complexity achieved was a 3rd Degree polynomial time O(n) with bounds in Z. While this seems to be sufficient for a summation with 2 bounds, the complexity of completing this summation for an entire protein quickly becomes factorial, as each functional addition brings about another set of parameters to work through, as described here for just the kinetic energy of a protein, without taking into account the different material affects with each solvent.…”

Section: Complexity Class Comparisonsmentioning

confidence: 99%

Utilizing Quantum Biological Techniques on a Quantum Processing Unit for Improved Protein Binding Site Determination

Sandeep

Gupta

Keenan

2020

Preprint

View full text Add to dashboard Cite

Iff Technologies has constructed a tool named Polar+ that can predict protein-to-protein binding that operates faster and at a higher quality than the prominent industry standards for protein binding,including Autodock Vina and SwissDock. The ability to provide this advantage over other market leaders comes from a new approach to biophysics, dubbed many-body biological quantum systems, that are modeled using quantum processing units and quantum algorithms provided by Rigetti. This paper provides both experimental and theoretical evidence behind the validity of the quantum biology approach to protein modeling, an overview of the first experimental work completed by Polar+, and a review of results obtained compared to other tools and data found in the lab.

show abstract

Analyzing Raman spectral data without separabiliy assumption

et al. 2021

View full text Add to dashboard Cite

Raman spectroscopy is a well established tool for the analysis of vibration spectra, which then allow for the determination of individual substances in a chemical sample, or for their phase transitions. In the time-resolved-Raman-sprectroscopy the vibration spectra of a chemical sample are recorded sequentially over a time interval, such that conclusions for intermediate products (transients) can be drawn within a chemical process. The observed data-matrix M from a Raman spectroscopy can be regarded as a matrix product of two unknown matrices W and H, where the first is representing the contribution of the spectra and the latter represents the chemical spectra. One approach for obtaining W and H is the non-negative matrix factorization. We propose a novel approach, which does not need the commonly used separability assumption. The performance of this approach is shown on a real world chemical example.

show abstract

Low-cost scalable discretization, prediction, and feature selection for complex systems

Cited by 25 publications

References 43 publications

Applications of matrix factorization methods to climate data

Applications of matrix factorization methods to climate data

Utilizing Quantum Biological Techniques on a Quantum Processing Unit for Improved Protein Binding Site Determination

Analyzing Raman spectral data without separabiliy assumption

Contact Info

Product

Resources

About