Unsupervised Discovery of Temporal Structure in Noisy Data with Dynamical Components Analysis

Clark, David G.; Livezey, Jesse A.; Bouchard, Kristofer E.

doi:10.48550/arxiv.1905.09944

Cited by 2 publications

(5 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The general case remains unsolved, and is obviously even harder than the above-mentioned 1-vector autoencoding problem. The recent work [6,7] review the state-of-the art as well as presenting Contrastive Predictive Coding and Dynamic Component Analysis, powerful new distillation techniques for time series, following the long tradition of setting f = g even though this is provably not optimal for the Gaussian case as shown in [8].…”

Section: Random What Ismentioning

confidence: 99%

“…While fine-grained binning has no effect on the entropy H(Y ) and negligible effect on I(W, Y ), it dramatically reduces the entropy of our data. Whereas H(W ) = ∞ since W is continuous 7 , H(W ) = log N is finite, approaching infinity only in the limit of infinitely many infinitesimal bins. Taken together, these scalings of I and H imply that the leftmost part of the Pareto frontier I * (H * ), defined by equation ( 1) and illustrated in Figure 1, asymptotes to a horizontal line of height I * = I(X, Y ) as H * → ∞.…”

Section: The Pareto Frontier Is Spanned By Contiguous Binningsmentioning

confidence: 99%

See 1 more Smart Citation

Pareto-Optimal Data Compression for Binary Classification Tasks

Tegmark¹,

Wu²

2019

Entropy

View full text Add to dashboard Cite

The goal of lossy data compression is to reduce the storage cost of a data set X while retaining as much information as possible about something (Y ) that you care about. For example, what aspects of an image X contain the most information about whether it depicts a cat? Mathematically, this corresponds to finding a mapping X → Z ≡ f (X) that maximizes the mutual information I(Z, Y ) while the entropy H(Z) is kept below some fixed threshold. We present a method for mapping out the Pareto frontier for classification tasks, reflecting the tradeoff between retained entropy and class information. We first show how a random variable X (an image, say) drawn from a class Y ∈ {1, ..., n} can be distilled into a vector W = f (X) ∈ R n−1 losslessly, so that I(W, Y ) = I(X, Y ); for example, for a binary classification task of cats and dogs, each image X is mapped into a single real number W retaining all information that helps distinguish cats from dogs. For the n = 2 case of binary classification, we then show how W can be further compressed into a discrete variable Z = g β (W ) ∈ {1, ..., m β } by binning W into m β bins, in such a way that varying the parameter β sweeps out the full Pareto frontier, solving a generalization of the Discrete Information Bottleneck (DIB) problem. We argue that the most interesting points on this frontier are "corners" maximizing I(Z, Y ) for a fixed number of bins m = 2, 3... which can be conveniently be found without multiobjective optimization. We apply this method to the CIFAR-10, MNIST and Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoretically optimal image clustering algorithm.

show abstract

Section: Random What Ismentioning

confidence: 99%

Section: The Pareto Frontier Is Spanned By Contiguous Binningsmentioning

confidence: 99%

Pareto-Optimal Data Compression for Binary Classification Tasks

Tegmark¹,

Wu²

2019

Entropy

View full text Add to dashboard Cite

show abstract

“…Alternatively, since the goal of GPFA is to maximize the predictability of their DLVs, we adopt the predictive information instead of the evaluation index. [33,37] Given a temporal series X ¼ x t f g n t¼1 , x t ℝ m with its probability density function (pdf) denoted as P(X), define…”

Section: Dimensionality Reduction Of Gpfamentioning

confidence: 99%

“…In GPFA, the weight matrix W is obtained when the predictive information in the latent scores s t = W > x t is maximized, which coincides with the definition of PI pred T S ð Þ. To obtain the explicit expression of PI pred T S ð Þ, inspired by the denotations in Clark et al, [33] a spatiotemporal covariance matrix Σ T (X) is defined to encode all second-order statistics of X across T time steps.…”

Section: Dimensionality Reduction Of Gpfamentioning

confidence: 99%

“…It is noted that parameter selection of GPFA is an open problem. Since GPFA takes maximal predictability as the objective when trying to extract dynamic features, information theory, especially entropy, [33] is an effective technique to quantify the predictability. Therefore, an index termed predictive information [34] is adopted in this work to optimize the parameters of GPFA.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multivariate temporal process monitoring with graph‐based predictable feature analysis

Zhu

Ren

et al. 2022

Can J Chem Eng

View full text Add to dashboard Cite

Dynamic latent variable (DLV) methods have been widely studied for high dimensional time series monitoring by exploiting dynamic relations among process variables. However, explicit extraction of predictable information is rarely emphasized in these DLV methods. In this paper, the graph-based predictable feature analysis (GPFA) algorithm is introduced for statistical process monitoring due to its explicit predictability, and a novel index, prediction information, is designed to determine the number of its principal components for dimensionality reduction and parameter optimization. A GPFA-based dynamic process monitoring framework is proposed to differentiate among dynamic faults, normal operating condition changes, and break in relation in the normal data. Case studies on the Tennessee Eastman process and a highpressure feedwater heater are conducted to demonstrate the superiority of GPFA over other approaches in terms of fault detection performance.

show abstract

Unsupervised Discovery of Temporal Structure in Noisy Data with Dynamical Components Analysis

Cited by 2 publications

References 35 publications

Pareto-Optimal Data Compression for Binary Classification Tasks

Pareto-Optimal Data Compression for Binary Classification Tasks

Multivariate temporal process monitoring with graph‐based predictable feature analysis

Contact Info

Product

Resources

About