Clustering Dynamic Textures with the Hierarchical EM Algorithm for Modeling Video

Mumtaz, Adeel; Coviello, Emanuele; Lanckriet, Gert R. G.; Chan, Antoni B.

doi:10.1109/tpami.2012.236

Cited by 53 publications

(36 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In addition, most methods require prior knowledge in the form of a "clean" training video containing only the background. Dynamic texture models have also shown promise in clustering the microscopic and macroscopic motion patterns present in dynamic scenes [8][9][10]. [8] performs motion segmentation by clustering video patches using a mixture of DTs.…”

Section: Introductionmentioning

confidence: 99%

Joint Motion Segmentation and Background Estimation in Dynamic Scenes

Mumtaz

Zhang

Chan

2014

2014 IEEE Conference on Computer Vision and Pattern Recognition

View full text Add to dashboard Cite

We propose a joint foreground-background mixture model (FBM) that simultaneously performs background estimation and motion segmentation in complex dynamic scenes. Our FBM consist of a set of location-specific dynamic texture (DT) components, for modeling local background motion, and set of global DT components, for modeling consistent foreground motion. We derive an EM algorithm for estimating the parameters of the FBM. We also apply spatial constraints to the FBM using an Markov random field grid, and derive a corresponding variational approximation for inference. Unlike existing approaches to background subtraction, our FBM does not require a manually selected threshold or a separate training video. Unlike existing motion segmentation techniques, our FBM can segment foreground motions over complex background with mixed motions, and detect stopped objects. Since most dynamic scene datasets only contain videos with a single foreground object over a simple background, we develop a new challenging dataset with multiple foreground objects over complex dynamic backgrounds. In experiments, we show that jointly modeling the background and foreground segments with FBM yields significant improvements in accuracy on both background estimation and motion segmentation, compared to state-of-the-art methods.

show abstract

Section: Introductionmentioning

confidence: 99%

Joint Motion Segmentation and Background Estimation in Dynamic Scenes

Mumtaz

Zhang

Chan

2014

2014 IEEE Conference on Computer Vision and Pattern Recognition

View full text Add to dashboard Cite

show abstract

“…An alternative approach, presented in [2], [3], is based on the probabilistic framework of the DT. For each video, spatiotemporal patches are extracted using dense sampling, and a dynamic texture mixture (DTM) is learned for each video using the EM algorithm [27].…”

Section: Learning the Codebookmentioning

confidence: 99%

“…T HE bag-of-systems (BoS) representation [1], a high-level descriptor of motion in a video, has seen promising results in video texture classification [2], [3], [4]. The BoS representation of videos is analogous to the bag-of-words representation of text documents, where documents are represented by counting the occurrences of each word, or the bag-of-visual-words representation of images, where images are represented by counting the occurrences of visual codewords in the image.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Scalable and Accurate Descriptor for Dynamic Textures Using Bag of System Trees

Mumtaz

Coviello

Lanckriet

et al. 2015

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

The bag-of-systems (BoS) representation is a descriptor of motion in a video, where dynamic texture (DT) codewords represent the typical motion patterns in spatio-temporal patches extracted from the video. The efficacy of the BoS descriptor depends on the richness of the codebook, which depends on the number of codewords in the codebook. However, for even modest sized codebooks, mapping videos onto the codebook results in a heavy computational load. In this paper we propose the BoS Tree, which constructs a bottom-up hierarchy of codewords that enables efficient mapping of videos to the BoS codebook. By leveraging the tree structure to efficiently index the codewords, the BoS Tree allows for fast look-ups in the codebook and enables the practical use of larger, richer codebooks. We demonstrate the effectiveness of BoS Trees on classification of four video datasets, as well as on annotation of a video dataset and a music dataset. Finally, we show that, although the fast look-ups of BoS Tree result in different descriptors than BoS for the same video, the overall distance (and kernel) matrices are highly correlated resulting in similar classification performance.

show abstract

“…A second dominant approach is the global modeling of videos using Linear Dynamical Systems (LDS) (Chan and Vasconcelos, 2008;Doretto et al, 2003;Mumtaz et al, 2013;Saisan et al, 2001). In its essence, LDS is a latent variable model which projects video frames to a lower dimensional space and tracks the temporal behaviour in that lower dimensional space.…”

Section: Related Workmentioning

confidence: 99%

On the Segmentation and Classification of Water in Videos

Mettes

Tan

Veltkamp

2014

Proceedings of the 9th International Conference on Computer Vision Theory and Applications

View full text Add to dashboard Cite

Abstract:The automatic recognition of water entails a wide range of applications, yet little attention has been paid to solve this specific problem. Current literature generally treats the problem as a part of more general recognition tasks, such as material recognition and dynamic texture recognition, without distinctively analyzing and characterizing the visual properties of water. The algorithm presented here introduces a hybrid descriptor based on the joint spatial and temporal local behaviour of water surfaces in videos. The temporal behaviour is quantified based on temporal brightness signals of local patches, while the spatial behaviour is characterized by Local Binary Pattern histograms. Based on the hybrid descriptor, the probability of a small region of being water is calculated using a Decision Forest. Furthermore, binary Markov Random Fields are used to segment the image frames. Experimental results on a new and publicly available water database and a subset of the DynTex database show the effectiveness of the method for discriminating water from other dynamic and static surfaces and objects.

show abstract

Clustering Dynamic Textures with the Hierarchical EM Algorithm for Modeling Video

Cited by 53 publications

References 32 publications

Joint Motion Segmentation and Background Estimation in Dynamic Scenes

Joint Motion Segmentation and Background Estimation in Dynamic Scenes

A Scalable and Accurate Descriptor for Dynamic Textures Using Bag of System Trees

On the Segmentation and Classification of Water in Videos

Contact Info

Product

Resources

About