Learning Statistical Texture for Semantic Segmentation

Zhu, Lanyun; Ji, Deyi; Zhu, Shiping; Gan, Weihao; Wu, Wei; Yan, Junjie

doi:10.1109/cvpr46437.2021.01235

Cited by 112 publications

(36 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Fully convolutional network (FCN) [38] is selected as the basic structure of our network since the grasp label is pixellevel. FCN is widely used for semantic segmentation of images [39][40][41][42]. Our previous work [25] prove that FCN is effective for predicting dense grasp poses.…”

Section: A Network Architecturementioning

confidence: 89%

On-Policy Pixel-Level Grasping Across the Gap Between Simulation and Reality

Wang¹,

Chang²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Grasp detection in cluttered scenes is a very challenging task for robots. Generating synthetic grasping data is a popular way to train and test grasp methods, as is Dex-net and GraspNet; yet, these methods generate training grasps on 3D synthetic object models, but evaluate at images or point clouds with different distributions, which reduces performance on real scenes due to sparse grasp labels and covariate shift. To solve existing problems, we propose a novel on-policy grasp detection method, which can train and test on the same distribution with dense pixel-level grasp labels generated on RGB-D images. A Parallel-Depth Grasp Generation (PDG-Generation) method is proposed to generate a parallel depth image through a new imaging model of projecting points in parallel; then this method generates multiple candidate grasps for each pixel and obtains robust grasps through flatness detection, force-closure metric and collision detection. Then, a large comprehensive Pixel-Level Grasp Pose Dataset (PLGP-Dataset) is constructed and released; distinguished with previous datasets with off-policy data and sparse grasp samples, this dataset is the first pixellevel grasp dataset, with the on-policy distribution where grasps are generated based on depth images. Lastly, we build and test a series of pixel-level grasp detection networks with a data augmentation process for imbalance training, which learn grasp poses in a decoupled manner on the input RGB-D images. Extensive experiments show that our on-policy grasp method can largely overcome the gap between simulation and reality, and achieves the state-of-the-art performance. Code and data are provided at https://github.com/liuchunsense/PLGP-Dataset.

show abstract

Section: A Network Architecturementioning

confidence: 89%

On-Policy Pixel-Level Grasping Across the Gap Between Simulation and Reality

Wang¹,

Chang²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Most notable feature spaces include color intensity [14], texture homogeneity [43,58,69], multi-resolution features [74,88], and feature curvature [63,86]. More recent, deep learning approaches have translated the problem of texture representation to focus on explicit identification of materials through texture encoding [20,109,112], differential angular imaging [106], 3D surface variation estimation [31], auxiliary tactile property [85], and radiometric properties estimation such as the bidirectional reflectance distribution function (BRDF) [11,62,103] and the bidirectional texture function (BTF) [104]. Those methods seek to learn low-level features that are key to material classification and segmentation.…”

Section: Materials and Texture Identificationmentioning

confidence: 99%

Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks

Akiva¹,

Purri²,

Leotta³

2021

Preprint

View full text Add to dashboard Cite

Self-supervised learning aims to learn image feature representations without the usage of manually annotated labels. It is often used as a precursor step to obtain useful initial network weights which contribute to faster convergence and superior performance of downstream tasks. While selfsupervision allows one to reduce the domain gap between supervised and unsupervised learning without the usage of labels, the self-supervised objective still requires a strong inductive bias to downstream tasks for effective transfer learning. In this work, we present our material and texture based self-supervision method named MATTER (MATerial and TExture Representation Learning), which is inspired by classical material and texture methods. Material and texture can effectively describe any surface, including its tactile properties, color, and specularity. By extension, effective representation of material and texture can describe other semantic classes strongly associated with said material and texture. MATTER leverages multi-temporal, spatially aligned remote sensing imagery over unchanged regions to learn invariance to illumination and viewing angle as a mechanism to achieve consistency of material and texture representation. We show that our self-supervision pretraining method allows for up to 24.22% and 6.33% performance increase in unsupervised and fine-tuned setups, and up to 76% faster convergence on change detection, land cover classification, and semantic segmentation tasks.

show abstract

“…The state-of-the-art results on semantic segmentation benchmarks [15,12,41,67,4,39] are achieved by DeepLab series [6,7,8], which deal with multiscale context by Atrous Spatial Pyramid Pooling (ASPP). Besides, recent works utilize attention mechanisms [18,22,66,49,60,59,21], statistical analysis [69] and advanced pooling techniques [21].…”

Section: Semantic Segmentationmentioning

confidence: 99%

MetaPix: Domain Transfer for Semantic Segmentation by Meta Pixel Weighting

Jian¹,

Gao²

2021

Preprint

View full text Add to dashboard Cite

Training a deep neural model for semantic segmentation requires collecting a large amount of pixel-level labeled data. To alleviate the data scarcity problem presented in the real world, one could utilize synthetic data whose label is easy to obtain. Previous work has shown that the performance of a semantic segmentation model can be improved by training jointly with real and synthetic examples with a proper weighting on the synthetic data. Such weighting was learned by a heuristic to maximize the similarity between synthetic and real examples. In our work, we instead learn a pixel-level weighting of the synthetic data by meta-learning, i.e., the learning of weighting should only be minimizing the loss on the target task. We achieve this by gradient-on-gradient technique to propagate the target loss back into the parameters of the weighting model. The experiments show that our method with only one single meta module can outperform a complicated combination of an adversarial feature alignment, a reconstruction loss, plus a hierarchical heuristic weighting at pixel, region and image levels.

show abstract

Learning Statistical Texture for Semantic Segmentation

Cited by 112 publications

References 21 publications

On-Policy Pixel-Level Grasping Across the Gap Between Simulation and Reality

On-Policy Pixel-Level Grasping Across the Gap Between Simulation and Reality

Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks

MetaPix: Domain Transfer for Semantic Segmentation by Meta Pixel Weighting

Contact Info

Product

Resources

About