“…Qi et al [25] gives a comprehensive study on the voxel-based CNNs and multi-view CNNs for 3D object classification. Other than those above, pointbased approach [11,24,15] is recently drawing much attention; however, the performance on 3D object classification is yet inferior to those of multi-view approaches. The current state-of-the-art result on the ModelNet40 benchmark dataset is reported by Wang et al [37], which is also based on the multi-view approach.…”
Section: Related Workmentioning
confidence: 99%
“…Here, v i indicates the ID of the vertex where the i-th image of the object instance is observed. For instance, {v i } 20 i=1 in Candidate #2 is {1, 5,2,6,3,7,4,8,13,15,14,16,17,18,19,20,9,11,10, 12} ( Fig. 15 (b)).…”
Section: Sensitivity To Pre-defined Views Assumptionmentioning
We propose a Convolutional Neural Network (CNN)based model "RotationNet," which takes multi-view images of an object as input and jointly estimates its pose and object category. Unlike previous approaches that use known viewpoint labels for training, our method treats the viewpoint labels as latent variables, which are learned in an unsupervised manner during the training using an unaligned object dataset. RotationNet is designed to use only a partial set of multi-view images for inference, and this property makes it useful in practical scenarios where only partial views are available. Moreover, our pose alignment strategy enables one to obtain view-specific feature representations shared across classes, which is important to maintain high accuracy in both object categorization and pose estimation. Effectiveness of RotationNet is demonstrated by its superior performance to the state-of-the-art methods of 3D object classification on 10and 40-class ModelNet datasets. We also show that RotationNet, even trained without known poses, achieves the state-of-the-art performance on an object pose estimation dataset. The code is available on
“…Qi et al [25] gives a comprehensive study on the voxel-based CNNs and multi-view CNNs for 3D object classification. Other than those above, pointbased approach [11,24,15] is recently drawing much attention; however, the performance on 3D object classification is yet inferior to those of multi-view approaches. The current state-of-the-art result on the ModelNet40 benchmark dataset is reported by Wang et al [37], which is also based on the multi-view approach.…”
Section: Related Workmentioning
confidence: 99%
“…Here, v i indicates the ID of the vertex where the i-th image of the object instance is observed. For instance, {v i } 20 i=1 in Candidate #2 is {1, 5,2,6,3,7,4,8,13,15,14,16,17,18,19,20,9,11,10, 12} ( Fig. 15 (b)).…”
Section: Sensitivity To Pre-defined Views Assumptionmentioning
We propose a Convolutional Neural Network (CNN)based model "RotationNet," which takes multi-view images of an object as input and jointly estimates its pose and object category. Unlike previous approaches that use known viewpoint labels for training, our method treats the viewpoint labels as latent variables, which are learned in an unsupervised manner during the training using an unaligned object dataset. RotationNet is designed to use only a partial set of multi-view images for inference, and this property makes it useful in practical scenarios where only partial views are available. Moreover, our pose alignment strategy enables one to obtain view-specific feature representations shared across classes, which is important to maintain high accuracy in both object categorization and pose estimation. Effectiveness of RotationNet is demonstrated by its superior performance to the state-of-the-art methods of 3D object classification on 10and 40-class ModelNet datasets. We also show that RotationNet, even trained without known poses, achieves the state-of-the-art performance on an object pose estimation dataset. The code is available on
“…Choosing what feature to use can be a difficult hyper-parameter tuning task. The other one is the network created feature by processing all points inside the cell, such as the PointNet structure [27] called voxel feature encoding (VFE) layer in VoxelNet. Each cell possesses a 128 dimensional feature learned by the network.…”
We propose a new method for fusing LIDAR point cloud and camera-captured images in deep convolutional neural networks (CNN). The proposed method constructs a new layer called sparse non-homogeneous pooling layer to transform features between bird's eye view and front view. The sparse point cloud is used to construct the mapping between the two views. The pooling layer allows efficient fusion of the multi-view features at any stage of the network. This is favorable for 3D object detection using camera-LIDAR fusion for autonomous driving. A corresponding one-stage detector is designed and tested on the KITTI bird's eye view object detection dataset, which produces 3D bounding boxes from the bird's eye view map. The fusion method shows significant improvement on both speed and accuracy of the pedestrian detection over other fusion-based object detection networks.
IntroductionLIDAR and camera are becoming standard sensors for self-driving cars and 3D object detection is an important part of perception in driving scenarios. 2D front view images from cameras provide rich texture descriptions of the surrounding, while depth is hard to obtain. On the other hand, 3D point cloud from LIDAR can provide accurate depth and reflection intensity information, but the resolution is comparatively low. Therefore, images and point cloud are complementary to accomplish accurate and robust perception. The fusion of these two sensors is a prerequisite for autonomous vehicles to deal with complicated driving scenarios.The recent progress of convolutional neural networks (CNN) for classification and segmentation has invoked particular interest in applying deep neural networks (DNN) for object detection. The DNN-based object detection with either LIDAR [1,2] or camera [3,4,5] has been widely explored by researchers and pushed to a very high single-frame accuracy. However, 3D object detection is still hard for networks based on a single kind of sensor, which is shown in Table 1. The camera-based network obtains high average precision on 2D image bounding box because of the rich texture
“…This success has led researchers to apply similar methodology to 3D recognition tasks [46], facilitated by recent advances in computing that enable such tasks to be performed at scale. Seminal 3D classification datasets and efforts include ObjectNet3D [47], ShapeNet [48], VoxNet [49], and PointNet [50]. Most of these approaches focus on recognizing or creating objects with a given form and category (e.g., [51,52]), but there has been little work that seeks to derive the deeper relationship between desired functionality (e.g., performance and manufacturability) and requisite form (e.g.…”
Section: Machine Learning To Predict Am Qualitymentioning
Additive Manufacturing (AM) allows designers to create intricate geometries that were once too complex or expensive to achieve through traditional manufacturing processes. Currently, Design for Additive Manufacturing (DfAM) is restricted to experts in the field, and novices may overlook potentially transformational design potential enabled by AM. This project aims to make DfAM accessible to a broader audience through deep learning, enabling designers of all skill levels to leverage unique AM geometries when creating new designs. To demonstrate such an approach, a database of files was acquired from industry-sponsored AM challenges focused on lightweight design. These files were converted to a voxelized format, which provides more robust information for machine learning applications. Next, an autoencoder was constructed to a low-dimensional representation of the part designs. Finally, that autoencoder was used to construct a deep neural network capable of predicting various DfAM attributes. This work demonstrates a novel foray towards a more extensive DfAM support system that supports designers at all experience levels.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.