Multi-view deep neural network is perhaps the most successful approach in 3D shape classification. However, the fusion of multi-view features based on max or average pooling lacks a view selection mechanism, limiting its application in, e.g., multi-view active object recognition by a robot. This paper presents VERAM, a recurrent attention model capable of actively selecting a sequence of views for highly accurate 3D shape classification. VERAM addresses an important issue commonly found in existing attention-based models, i.e., the unbalanced training of the subnetworks corresponding to next view estimation and shape classification. The classification subnetwork is easily overfitted while the view estimation one is usually poorly trained, leading to a suboptimal classification performance. This is surmounted by three essential view-enhancement strategies: 1) enhancing the information flow of gradient backpropagation for the view estimation subnetwork, 2) devising a highly informative reward function for the reinforcement training of view estimation and 3) formulating a novel loss function that explicitly circumvents view duplication. Taking grayscale image as input and AlexNet as CNN architecture, VERAM with 9 views achieves instance-level and class-level accuracy of 95.5% and 95.3% on ModelNet10, 93.7% and 92.1% on ModelNet40, both are the state-of-the-art performance under the same number of views.
With the emergence of different kinds and styles of movements in the motion database, the methods which only support overall similarity motion retrieval can't meet the needs of practical applications. In this paper, we present an effective method based on relative geometry features to support partial similarity human motion retrieval. The key components of our approach include effective feature selection by Adaboost, initial feature weight predication for a query through regression model and effective relevance feedback based on feature weight adjustment. Experimental results prove the effectiveness of our proposed method.
This paper proposes a scalable method for organizing the collection of motion capture data for overview and exploration, and it mainly addresses three core problems, including data abstraction, neighborhood construction and data visualization. To alleviate the contradiction between limited visual space and the ever-increasing size of real-word datasets, hierarchical affinity propagation (HAP) is adopt to perform data abstraction on lowlevel pose features to generate multi-layers of data aggregations in consistent with coarse to fine abstraction levels of human cognition. To construct a meaningful neighborhood for user choosing a browsing path and positioning themselves, quartet analysis-based phylogenetic tree is created upon high-level pose features to produce more reliable neighbors for different aggregations of the specific abstraction level. To provide a convenient interactive environment for user navigation, a phylogenetic tree-centric visualization strategy in threedimensional space is present. Experimental results on HDM05 motion capture dataset verify the effectiveness of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.