6D pose estimation has been pervasively applied to various robotic applications, such as service robots, collaborative robots, and unmanned warehouses. However, accurate 6D pose estimation is still a challenge problem due to the complexity of application scenarios caused by illumination changes, occlusion and even truncation between objects, and additional refinement is required for accurate 6D object pose estimation in prior work. Aiming at the efficiency and accuracy of 6D object pose estimation in these complex scenes, this paper presents a novel end-to-end network, which effectively utilises the contextual information within a neighbourhood region of each pixel to estimate the 6D object pose from RGB-D images. Specifically, our network first applies the attention mechanism to extract effective pixel-wise dense multimodal features, which are then expanded to multi-scale dense features by integrating pixel-wise features at different scales for pose estimation. The proposed method is evaluated extensively on the LineMOD and YCB-Video datasets, and the experimental results show that the proposed method is superior to several state-of-the-art baselines in terms of average point distance and average closest point distance.
Recently, deep convolutional neural networks show good effect for single image deraining. These networks always adopt the conventional convolution method to extract features, which may neglect the characteristic of rain streak. A novelty vertical module is proposed to focus on the vertical characteristic of rain streak. Such module uses 1 × X convolution kernel to extract the vertical information of rain streaks and a X × X convolution kernel to keep relative location information. Use this module in the front of deraining network can better detach rain streaks from background. In addition, the contrastive learning is employed to improve the performance of the model. Extensive experimental results demonstrated the superiority of the deraining methods with the proposed methods in comparison with the base ones.
Deep models trained by using clean data have achieved tremendous success in fine-grained image classification. Yet, they generally suffer from significant performance degradation when encountering noisy labels. Existing approaches to handle label noise, though proved to be effective for generic object recognition, usually fail on fine-grained data. The reason is that, on fine-grained data, the category difference is subtle and the training sample size is small. Then deep models could easily overfit the noisy labels. To improve the robustness of deep models on noisy data for fine-grained visual categorization, in this paper, we propose a novel learning framework named ProtoSimi. Our method employs an adaptive label correction strategy, ensuring effective learning on limited data. Specifically, our approach considers the criteria of exploring the effectiveness of both global class-prototype and part class-prototype similarities in identifying and correcting labels of samples. We evaluate our method on three standard benchmarks of fine-grained recognition. Experimental results show that our method outperforms the existing label noisy methods by a large margin. In ablation studies, we also verify that our method is non-sensitive to hyper-parameters selection and can be integrated with other FGVC methods to increase the generalization performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.