Capsule network is a novel architecture to encode the properties and spatial relationships of the feature in the images, which shows encouraging results on image classification. However, the original capsule network is not suitable for some classification tasks that the detected object has complex internal representations. Hence, we propose Multi-Scale Capsule Network, a novel variation of capsule network to enhance the computational efficiency and representation capacity of capsule network. The proposed Multi-Scale Capsule Network consists of two stages. In the first stage the structural and semantic information are obtained by the multi-scale feature extraction. The second stage, we encode the hierarchy of features to multi-dimensional primary capsule. Moreover, we propose an improved dropout to enhance the robustness of capsule network. Experimental results show that our method has competitive performance on FashionMNIST and CIFAR10 datasets.
We present a conceptually simple framework for 6DoF object pose estimation, especially for autonomous driving scenarios. Our approach can efficiently detect the traffic participants from a monocular RGB image while simultaneously regressing their 3D translation and rotation vectors. The proposed method 6D-VNet, extends the Mask R-CNN by adding customised heads for predicting vehicle's finer class, rotation and translation. It is trained end-to-end compared to previous methods. Furthermore, we show that the inclusion of translational regression in the joint losses is crucial for the 6DoF pose estimation task, where object translation distance along longitudinal axis varies significantly, e.g. in autonomous driving scenarios. Additionally, we incorporate the mutual information between traffic participants via a modified non-local block to capture the spatial dependencies among the detected objects. As opposed to the original non-local block implementation, the proposed weighting modification takes the spatial neighbouring information into consideration whilst counteracting the effect of extreme gradient values. We evaluate our method on the challenging real-world Pascal3D+ dataset and our 6D-VNet reaches the 1st place in ApolloScape challenge 3D Car Instance task [1], [2].
Current state-of-the-art two-stage detectors heavily rely on region proposals to guide the accurate detection for objects. In previous region proposal approaches, the interaction between different functional modules is correlated weakly, which limits or decreases the performance of region proposal approaches. In this paper, we propose a novel two-stage strong correlation learning framework, abbreviated as SC-RPN, which aims to set up stronger relationship among different modules in the region proposal task. Firstly, we propose a Light-weight IoU-Mask branch to predict intersection-over-union (IoU) mask and refine region classification scores as well, it is used to prevent high-quality region proposals from being filtered. Furthermore, a sampling strategy named Size-Aware Dynamic Sampling (SADS) is proposed to ensure sampling consistency between different stages. In addition, point-based representation is exploited to generate region proposals with stronger fitting ability. Without bells and whistles, SC-RPN achieves AR1000 14.5% higher than that of Region Proposal Network (RPN), surpassing all the existing region proposal approaches. We also integrate SC-RPN into Fast R-CNN and Faster R-CNN to test its effectiveness on object detection task, the experimental results achieve a gain of 3.2% and 3.8% in terms of mAP compared to the original ones.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.