Geometric deep learning is increasingly important thanks to the popularity of 3D sensors. Inspired by the recent advances in NLP domain, the self-attention transformer is introduced to consume the point clouds. We develop Point Attention Transformers (PATs), using a parameter-efficient Group Shuffle Attention (GSA) to replace the costly Multi-Head Attention. We demonstrate its ability to process size-varying inputs, and prove its permutation equivariance. Besides, prior work uses heuristics dependence on the input data (e.g., Furthest Point Sampling) to hierarchically select subsets of input points. Thereby, we for the first time propose an end-to-end learnable and taskagnostic sampling operation, named Gumbel Subset Sampling (GSS), to select a representative subset of input points. Equipped with Gumbel-Softmax, it produces a "soft" continuous subset in training phase, and a "hard" discrete subset in test phase. By selecting representative subsets in a hierarchical fashion, the networks learn a stronger representation of the input sets with lower computation cost. Experiments on classification and segmentation benchmarks show the effectiveness and efficiency of our methods. Furthermore, we propose a novel application, to process event camera stream as point clouds, and achieve a state-of-theart performance on DVS128 Gesture Dataset.
Recovering the shape and reflectance of non-Lambertian surfaces remains a challenging problem in computer vision since the view-dependent appearance invalidates traditional photo-consistency constraint. In this paper, we introduce a novel concentric multi-spectral light field (CMSLF) design that is able to recover the shape and reflectance of surfaces with arbitrary material in one shot. Our CMSLF system consists of an array of cameras arranged on concentric circles where each ring captures a specific spectrum. Coupled with a multi-spectral ring light, we are able to sample viewpoint and lighting variations in a single shot via spectral multiplexing. We further show that such concentric camera/light setting results in a unique pattern of specular changes across views that enables robust depth estimation. We formulate a physical-based reflectance model on CM-SLF to estimate depth and multi-spectral reflectance map without imposing any surface prior. Extensive synthetic and real experiments show that our method outperforms stateof-the-art light field-based techniques, especially in non-Lambertian scenes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.