Abstract-Successful fine-grained image classification methods learn subtle details between visually similar (sub-)classes, but the problem becomes significantly more challenging if the details are missing due to low resolution. Encouraged by the recent success of Convolutional Neural Network (CNN) architectures in image classification, we propose a novel resolution-aware deep model which combines convolutional image super-resolution and convolutional fine-grained classification into a single model in an end-to-end manner. Extensive experiments on multiple benchmarks demonstrate that the proposed model consistently performs better than conventional convolutional networks on classifying fine-grained object classes in low-resolution images. The problem of image classification is to categorise images according to their semantic content (e.g. person, plane). Finegrained image classification further divides classes to their "sub-categories" such as the models of cars [1], the species of birds [2], the categories of flowers [3] and the breeds of dogs [4]. Fine-grained categorisation is a difficult task due to small inter-class variance between visually similar subclasses. The problem becomes even more challenging when available images are low-resolution (LR) images where many details are missing as compared to their high-resolution (HR) counterparts.Since the rise of Convolutional Neural Network (CNN) architectures in image classification [5], the accuracy of finegrained image classification has dramatically improved and many CNN-based extensions have been proposed [6] Deng et al. [19] we propose a unique end-to-end deep learning framework that combines CNN super-resolution and CNN fine-grained classification -a resolution-aware Convolutional Neural Network (RACNN) for fine-grained object categorisation in low-resolution images. To our best knowledge, our work is the first end-to-end learning model for low-resolution fine-grained object classification.Our main principle is simple: the higher image resolution, the easier for classification. Our research questions are: Can computational super-resolution recover some of the important details required for fine-grained image classification and can super-resolution improves fine-grained classification and SRbased fine-grained classification can be designed into a supervised end-to-end learning framework, as depicted in Figure 1 illustrating the difference between RACNN and conventional CNN. I. RELATED WORKFine-Grained Image Categorisation -Recent algorithms for discriminating fine-grained classes (such as animal species arXiv:1703.05393v3 [cs.CV]
This paper proposes a universal framework, called OVE6D, for model-based 6D object pose estimation from a single depth image and a target object mask. Our model is trained using purely synthetic data rendered from ShapeNet, and, unlike most of the existing methods, it generalizes well on new real-world objects without any fine-tuning. We achieve this by decomposing the 6D pose into viewpoint, inplane rotation around the camera optical axis and translation, and introducing novel lightweight modules for estimating each component in a cascaded manner. The resulting network contains less than 4M parameters while demonstrating excellent performance on the challenging T-LESS and Occluded LINEMOD datasets without any datasetspecific training. We show that OVE6D outperforms some contemporary deep learning-based pose estimation methods specifically trained for individual objects or datasets with real-world training data. The implementation is available at https://github.com/dingdingcai/OVE6D-pose.
This paper presents an efficient symmetry-agnostic and correspondence-free framework, referred to as SC6D, for 6D object pose estimation from a single monocular RGB image. SC6D requires neither the 3D CAD model of the object nor any prior knowledge of the symmetries. The pose estimation is decomposed into three sub-tasks: a) object 3D rotation representation learning and matching; b) estimation of the 2D location of the object center; and c) scaleinvariant distance estimation (the translation along the zaxis) via classification. SC6D is evaluated on three benchmark datasets, T-LESS, YCB-V, and ITODD, and results in state-of-the-art performance on the T-LESS dataset. Moreover, SC6D is computationally much more efficient than the previous state-of-the-art method SurfEmb. The implementation and pre-trained models are publicly available at https://github.com/dingdingcai/SC6D-pose.
This paper introduces GS-Pose, an end-to-end framework for locating and estimating the 6D pose of objects. GS-Pose begins with a set of posed RGB images of a previously unseen object and builds three distinct representations stored in a database. At inference, GS-Pose operates sequentially by locating the object in the input image, estimating its initial 6D pose using a retrieval approach, and refining the pose with a render-and-compare method. The key insight is the application of the appropriate object representation at each stage of the process. In particular, for the refinement step, we utilize 3D Gaussian splatting, a novel differentiable rendering technique that offers high rendering speed and relatively low optimization time. Off-the-shelf toolchains and commodity hardware, such as mobile phones, can be used to capture new objects to be added to the database. Extensive evaluations on the LINEMOD and OnePose-LowTexture datasets demonstrate excellent performance, establishing the new state-of-the-art. Project page: https://dingdingcai.github.io/gs-pose
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.