Chenxin Tao scite author profile

Self-supervised learning (SSL) has delivered superior performance on a variety of downstream vision tasks. Two main-stream SSL frameworks have been proposed, i.e., Instance Discrimination (ID) and Masked Image Modeling (MIM). ID pulls together the representations of different views from the same image, while avoiding feature collapse. It does well on linear probing but is inferior in detection performance. On the other hand, MIM reconstructs the original content given a masked image. It excels at dense prediction but fails to perform well on linear probing. Their distinctions are caused by neglecting the representation requirements of either semantic alignment or spatial sensitivity. Specifically, we observe that (1) semantic alignment demands semantically similar views to be projected into nearby representation, which can be achieved by contrasting different views with strong augmentations; (2) spatial sensitivity requires to model the local structure within an image. Predicting dense representations with masked image is therefore beneficial because it models the conditional distribution of image content. Driven by these analysis, we propose Siamese Image Modeling (SIM), which predicts the dense representations of an augmented view, based on another masked view from the same image but with different augmentations. Our method uses a Siamese network with two branches. The online branch encodes the first view, and predicts the second view's representation according to the relative positions between these two views. The target branch produces the target by encoding the second view. In this way, we are able to achieve comparable linear probing and dense prediction performances with ID and MIM, respectively. We also demonstrate that decent linear probing result can be obtained without a global loss. Code shall be released at https://github.com/fundamentalvision/Siamese-Image-Modeling.

show abstract

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

Li¹,

Tao²,

Zhu³

et al. 2020

Preprint

View full text Add to dashboard Cite

We propose a general framework for searching surrogate losses for mainstream semantic segmentation metrics. This is in contrast to existing loss functions manually designed for individual metrics. The searched surrogate losses can generalize well to other datasets and networks. Extensive experiments on PASCAL VOC and Cityscapes demonstrate the effectiveness of our approach. Code shall be released. * Equal contribution. † This work is done when Hao Li and Chenxin Tao are interns at SenseTime Research.

show abstract

Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework

Tao

Wang

Zhu

et al. 2022

View full text Add to dashboard Cite

Learning Channel-Wise Interactions for Binary Convolutional Neural Networks

Wang

Tao

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chenxin Tao

Siamese Image Modeling for Self-Supervised Vision Representation Learning

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework

Learning Channel-Wise Interactions for Binary Convolutional Neural Networks

Contact Info

Product

Resources

About