Recent research in image and video recognition indicates that many visual
processes can be thought of as being generated by a time-varying generative
model. A nearby descriptive model for visual processes is thus a statistical
distribution that varies over time. Specifically, modeling visual processes as
streams of histograms generated by a kernelized linear dynamic system turns out
to be efficient. We refer to such a model as a System of Bags. In this work, we
investigate Systems of Bags with special emphasis on dynamic scenes and dynamic
textures. Parameters of linear dynamic systems suffer from ambiguities. In
order to cope with these ambiguities in the kernelized setting, we develop a
kernelized version of the alignment distance. For its computation, we use a
Jacobi-type method and prove its convergence to a set of critical points. We
employ it as a dissimilarity measure on Systems of Bags. As such, it
outperforms other known dissimilarity measures for kernelized linear dynamic
systems, in particular the Martin Distance and the Maximum Singular Value
Distance, in every tested classification setting. A considerable margin can be
observed in settings, where classification is performed with respect to an
abstract mean of video sets. For this scenario, the presented approach can
outperform state-of-the-art techniques, such as Dynamic Fractal Spectrum or
Orthogonal Tensor Dictionary Learning