2020
DOI: 10.1126/sciadv.aaw0961
|View full text |Cite
|
Sign up to set email alerts
|

Low-cost scalable discretization, prediction, and feature selection for complex systems

Abstract: Finding reliable discrete approximations of complex systems is a key prerequisite when applying many of the most popular modeling tools. Common discretization approaches (e.g., the very popular K-means clustering) are crucially limited in terms of quality, parallelizability, and cost. We introduce a low-cost improved quality scalable probabilistic approximation (SPA) algorithm, allowing for simultaneous data-driven optimal discretization, feature selection, and prediction. We prove its optimality, parallel eff… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
36
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 25 publications
(36 citation statements)
references
References 43 publications
0
36
0
Order By: Relevance
“…Similar generalizations can also be constructed for convex decompositions of the data. For instance, by virtue of the decomposability of the least squares cost function, it is possible to construct a joint convex discretization of instantaneous and lagged values of the variables of interest, which forms the basis of the scalable probabilistic approximation method (Gerber et al, 2020). In this approach, the decomposition may be parameterized in terms of a transition matrix relating the weights at different times, thus naturally incorporating a temporal constraint into the discretization.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…Similar generalizations can also be constructed for convex decompositions of the data. For instance, by virtue of the decomposability of the least squares cost function, it is possible to construct a joint convex discretization of instantaneous and lagged values of the variables of interest, which forms the basis of the scalable probabilistic approximation method (Gerber et al, 2020). In this approach, the decomposition may be parameterized in terms of a transition matrix relating the weights at different times, thus naturally incorporating a temporal constraint into the discretization.…”
Section: Discussionmentioning
confidence: 99%
“…7. The least restricted version (Lee and Seung, 1997;Gerber et al, 2020) requires only that the reconstruction lies in the convex hull of the basis, or, in other words, that the weights Z satisfy the constraints…”
Section: Matrix Factorizationsmentioning
confidence: 99%
See 2 more Smart Citations
“…Most notably, the major difference that exists between modeling proteins and their binding sites on a quantum computer versus on any classical machine is the difference in complexity classes available to solve within. Looking at the most recently well-cited model for statistical modeling and classifications of proteins by Dr. Susanne Gerber at the University of Mainz [21], the highest degree of complexity achieved was a 3rd Degree polynomial time O(n) with bounds in Z. While this seems to be sufficient for a summation with 2 bounds, the complexity of completing this summation for an entire protein quickly becomes factorial, as each functional addition brings about another set of parameters to work through, as described here for just the kinetic energy of a protein, without taking into account the different material affects with each solvent.…”
Section: Complexity Class Comparisonsmentioning
confidence: 99%