2022 26th International Conference on Pattern Recognition (ICPR) 2022
DOI: 10.1109/icpr56361.2022.9956553
|View full text |Cite
|
Sign up to set email alerts
|

Temporal Alignment for History Representation in Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(25 citation statements)
references
References 14 publications
0
25
0
Order By: Relevance
“…We use by default an ImageNet (Deng et al, 2009) pretrained Vision Transformer (ViT-B) (Dosovitskiy et al, 2021) as the image view teacher, and we use the text encoder from CLIP (Radford et al, 2021) as the text view teacher. The image and text teacher encoders are frozen during pretraining, and Smooth 1 -based positive-only distillation Ermolov et al, 2021) is used. Fig.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We use by default an ImageNet (Deng et al, 2009) pretrained Vision Transformer (ViT-B) (Dosovitskiy et al, 2021) as the image view teacher, and we use the text encoder from CLIP (Radford et al, 2021) as the text view teacher. The image and text teacher encoders are frozen during pretraining, and Smooth 1 -based positive-only distillation Ermolov et al, 2021) is used. Fig.…”
Section: Methodsmentioning
confidence: 99%
“…is the distance function defined in some metric space C. To avoid representation collapsing (He et al, 2020), InfoMax-Principle (Hjelm et al, 2019;Bachman et al, 2019) based metrics like MINE (Belghazi et al, 2018), InfoNCE (van den Oord et al, 2018 are often used (He et al, 2020;Tian et al, 2020c;Zhang et al, 2022a). For methods using positive-only transformations, the metric function can be feature correlation measurement (Zbontar et al, 2021), 2 distance (Ermolov et al, 2021), or cosine similarity (Grill et al, 2020;.…”
Section: Knowledge Distillation: a Unified View Of Generative And Con...mentioning
confidence: 99%
“…Before delving into the details of the constraints, we start with the definition of autocovariance matrix: Whitening Normalization. Whitening normalization used in self-supervised computer vision tasks can scatter the batch samples to avoid degenerating embedding solutions collapsed onto a few dimensions or into a single point [47], with more feature or embedding decorrelation effect than Batch Normalization (BN) [48], even though BN also boosts the whitening effectiveness additionally [49]. In this work, we design a whitening normalization by employing Zerophase Component Analysis (ZCA) sphering [49], [50] to decorrelate any two graph signals s i and s j in the node embeddings H (l) .…”
Section: Three Constraints Derived From Graph Signal Decorrelationmentioning
confidence: 99%
“…BYOL [5] only uses positive examples' representations to compute mean squared error and gets comparable performance. W-MSE [21] uses a whitening transform to avoid degenerate solutions. Some contrastive models try to construct loss function from the perspective of the feature.…”
Section: Contrastive Learningmentioning
confidence: 99%