Algorithmic Statistics and Prediction for Polynomial Time-Bounded Algorithms

For each partition of a data set into a given number of parts there is a partition such that every part is as much as possible a good model (an "algorithmic sufficient statistic") for the data in that part. Since this can be done for every number between one and the number of data, the result is a function, the cluster structure function. It maps the number of parts of a partition to values related to the deficiencies of being good models by the parts. Such a function starts with a value at least zero for no partition of the data set and descents to zero for the partition of the data set into singleton parts. The optimal clustering is the one selected by analyzing the cluster structure function. The theory behind the method is expressed in algorithmic information theory (Kolmogorov complexity). In practice the Kolmogorov complexities involved are approximated by a concrete compressor. We give examples using real data sets: the MNIST handwritten digits and the segmentation of real cells as used in stem cell research.

show abstract

“…We define a conditional probability of n-bit strings following [22]. We start with the unconditional probability.…”

Section: Probabilities Among Members Of Clustersmentioning

confidence: 99%

The Cluster Structure Function

Cohen

Vitányi

2023

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

show abstract

Prediction and MDL for infinite sequences

Milovanov

2024

Theory Comput Syst

View full text Add to dashboard Cite

We combine Solomonoff’s approach to universal prediction with algorithmic statistics and suggest to use the computable measure that provides the best “explanation” for the observed data (in the sense of algorithmic statistics) for prediction. In this way we keep the expected sum of squares of prediction errors bounded (as it was for the Solomonoff’s predictor) and, moreover, guarantee that the sum of squares of prediction errors is bounded along any Martin-Löf random sequence. An extended abstract of this paper was presented at the 16th International Computer Science Symposium in Russia (CSR 2021) (Milovanov 2021).

show abstract

Algorithmic Statistics and Prediction for Polynomial Time-Bounded Algorithms

Cited by 2 publications

References 10 publications

The Cluster Structure Function

The Cluster Structure Function

Prediction and MDL for infinite sequences

Contact Info

Product

Resources

About