Temporal Human Action Segmentation via Dynamic Clustering

Zhang, Yan; Sun, He; Tang, Siyu; Neumann, Heiko

doi:10.48550/arxiv.1803.05790

Cited by 1 publication

(3 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such feature mapping is equivalent to the bilinear form, in the sense that the associated kernel function, and hence the reproducing-kernel Hilbert space (RKHS), is identical. (3) We perform extensive experiments to investigate our novel bilinear pooling methods, and show that the proposed method consistently improves or is on-par with the performance of the state-of-the-art methods on diverse datasets. To our knowledge, we are the first to employ bilinear pooling in a convolutional encoder-decoder architecture for fine-grained action parsing over time.…”

Section: Introductionmentioning

confidence: 98%

“…Parsing fine-grained actions over time is important in many applications, which require understanding of subtle and precise operations over long-term periods, e.g. daily activities [1], surgical robots [2], human motion analysis [3] and animal behavior analysis in the lab [4]. Given a video or a generic time sequence of feature vectors, an action parsing algorithm aims at assigning each frame an action label, such that the entire sequence is partitioned into several disjoint semantic action primitives.…”

Section: Introductionmentioning

confidence: 99%

“…(2) The conventional bilinear pooling aggregates the outer products of the feature vectors by averaging, and hence loses representativeness when the real data distribution is complex. (3) The conventional bilinear pooling lifts the feature dimension from d to d 2 , causing parameter proliferation in the neural net and expensive computational cost.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Local Temporal Bilinear Pooling for Fine-Grained Action Parsing

Zhang

Tang

Muandet

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

View full text Add to dashboard Cite

Fine-grained temporal action parsing is important in many applications, such as daily activity understanding, human motion analysis, surgical robotics and others requiring subtle and precise operations over a long-term period. In this paper we propose a novel bilinear pooling operation, which is used in intermediate layers of a temporal convolutional encoder-decoder net. In contrast to previous work, our proposed bilinear pooling is learnable and hence can capture more complex local statistics than the conventional counterpart. In addition, we introduce exact lowerdimension representations of our bilinear forms, so that the dimensionality is reduced without suffering from information loss nor requiring extra computation. We perform extensive experiments to quantitatively analyze our model and show the superior performances to other state-of-the-art pooling work on various datasets.

show abstract

Section: Introductionmentioning

confidence: 98%