Data compression and learning in time sequences analysis

Puglisi, Andrea; Benedetto, Dario; Caglioti, Emanuele; Loreto, Vittorio; Vulpiani, Angelo

doi:10.1016/s0167-2789(03)00047-2

Cited by 36 publications

(32 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In Figure 3 an experiment of recognition is reported [20]. Here an unknown sequence is compared, in the sense discussed in the previous section, with a number of known strings.…”

Section: Examples and Numerical Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Measuring complexity with zippers

Baronchelli¹,

Caglioti²,

Loreto³

2005

Eur. J. Phys.

View full text Add to dashboard Cite

Abstract. Physics concepts have often been borrowed and independently developed by other fields of science. In this perspective a significant example is that of entropy in Information Theory. The aim of this paper is to provide a short and pedagogical introduction to the use of data compression techniques for the estimate of entropy and other relevant quantities in Information Theory and Algorithmic Information Theory. We consider in particular the LZ77 algorithm as case study and discuss how a zipper can be used for information extraction.

show abstract

“…In Figure 3 an experiment of recognition is reported [20]. Here an unknown sequence is compared, in the sense discussed in the previous section, with a number of known strings.…”

Section: Examples and Numerical Resultsmentioning

confidence: 99%

“…In this way the zipper "learns" the A file and, when encounters the B subsequence, tries to compress it with a coding optimized for A. If B is not too long [20,21], thus preventing LZ77 from learning it as well, the cross entropy per character can be estimated as:…”

Section: Zippersmentioning

confidence: 99%

Measuring complexity with zippers

Baronchelli¹,

Caglioti²,

Loreto³

2005

Eur. J. Phys.

View full text Add to dashboard Cite

show abstract

“…This term is intuitively close to C(x + ∆y) − C(x) in Equation (9), as both aim at expressing a small fraction of y only in terms of x. Secondly, the term C(y + ∆y) − C(y)in Equation (9) is intuitively close to C(∆y) in Equation (10), where in the former a representative dictionary extracted from y is used to code the fraction ∆y, while the latter discards any limitation regarding the size of the analysed objects and considers the full string y. This solves a problem raised in [30], which investigates the optimal size for ∆y in Equation (9), which does not represent y well enough if set too small, while it uses too much information from y itself in the compression step if set too large.…”

Section: Relative Entropymentioning

confidence: 96%

Expanding the Algorithmic Information Theory Frame for Applications to Earth Observation

Cerra

Datcu

2013

Entropy

View full text Add to dashboard Cite

Recent years have witnessed an increased interest towards compression-based methods and their applications to remote sensing, as these have a data-driven and parameter-free approach and can be thus succesfully employed in several applications, especially in image information mining. This paper expands the algorithmic information theory frame, on which these methods are based. On the one hand, algorithms originally defined in the pattern matching domain are reformulated, allowing a better understanding of the available compression-based tools for remote sensing applications. On the other hand, the use of existing compression algorithms is proposed to store satellite images with added semantic value.

show abstract

“…In [11] it has been studied in detail what happens when a compression algorithm tries to optimize its features at the interface between two different sequences A and B while zipping the sequence A + B obtained by simply appending B after A. It has been shown in particular the existence of a scaling function ruling the way the compression algorithm learns a sequence B after having compressed a sequence A.…”

Section: A Remoteness Between Two Textsmentioning

confidence: 99%

“…A first field of activity [11,12] is that of segmentation problems, i.e. cases in which a unique string must be partitioned into subsequences according to some criteria to identify discontinuities in its statistical properties.…”

Section: Introductionmentioning

confidence: 99%

Artificial sequences and complexity measures

2005

View full text Add to dashboard Cite

In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools to extract, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of Artificial Text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self consistentclassification.

show abstract

Data compression and learning in time sequences analysis

Cited by 36 publications

References 44 publications

Measuring complexity with zippers

Measuring complexity with zippers

Expanding the Algorithmic Information Theory Frame for Applications to Earth Observation

Artificial sequences and complexity measures

Contact Info

Product

Resources

About