Collaborative Representation for SPD Matrices with Application to Image-Set Classification

Chu, Li; Wang, Rui; Wu, Xiaojun

doi:10.48550/arxiv.2201.08962

Cited by 1 publication

(1 citation statement)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Yu et al proposed the contour covariance that lies on the SPD manifold as a region descriptor for accurate image classification [32]. Similarly, Chu et al proposed the modelling of image sets with covariance matrices for improved classification performance [33]. The importance and usefulness of feature modelling on the SPD manifold can be further highlighted from the design of novel deep networks and network layers, such as Variational Autoencoders [34], LSTMs [35], GRUs [36] and mapping and pooling layers [37] to handle and learn from features on the SPD manifold.…”

Section: B Manifold Backgroundmentioning

confidence: 99%

Multi-Manifold Attention for Vision Transformers

Konstantinidis,

Papastratis,

Dimitropoulos

et al. 2023

IEEE Access

View full text Add to dashboard Cite

This work was supported from EC under grant agreement 101061548 ''DAFNEplus: Decentralized platform for fair creative content distribution empowering creators and communities though new digital distribution models based on digital tokens.'' ABSTRACT Vision Transformers are very popular nowadays due to their state-of-the-art performance in several computer vision tasks, such as image classification and action recognition. Although their performance has been greatly enhanced through highly descriptive patch embeddings and hierarchical structures, there is still limited research on utilizing additional data representations so as to refine the self-attention map of a Transformer. To address this problem, a novel attention mechanism, called multi-manifold multi-head attention, is proposed in this work to substitute the vanilla self-attention of a Transformer. The proposed mechanism models the input space in three distinct manifolds, namely Euclidean, Symmetric Positive Definite and Grassmann, thus leveraging different statistical and geometrical properties of the input for the computation of a highly descriptive attention map. In this way, the proposed attention mechanism can guide a Vision Transformer to become more attentive towards important appearance, color and texture features of an image, leading to improved classification and segmentation results, as shown by the experimental results on well-known datasets.

show abstract