Deep learning on point clouds plays a vital role in a wide range of applications such as autonomous driving and AR/VR. These applications interact with people in real time on edge devices and thus require low latency and low energy. Compared to projecting the point cloud to 2D space, directly processing 3D point cloud yields higher accuracy and lower #MACs. However, the extremely sparse nature of point cloud poses challenges to hardware acceleration. For example, we need to explicitly determine the nonzero outputs and search for the nonzero neighbors (mapping operation), which is unsupported in existing accelerators. Furthermore, explicit gather and scatter of sparse features are required, resulting in large data movement overhead.In this paper, we comprehensively analyze the performance bottleneck of modern point cloud networks on CPU/GPU/TPU. To address the challenges, we then present PointAcc, a novel point cloud deep learning accelerator. PointAcc maps diverse mapping operations onto one versatile ranking-based kernel, streams the sparse computation with configurable caching, and temporally fuses consecutive dense layers to reduce the memory footprint. Evaluated on 8 point cloud models across 4 applications, PointAcc achieves 3.7× speedup and 22× energy savings over RTX 2080Ti GPU. Codesigned with light-weight neural networks, PointAcc rivals the prior accelerator Mesorasi by 100× speedup with 9.1% higher accuracy running segmentation on the S3DIS dataset. PointAcc paves the way for efficient point cloud recognition.
CCS CONCEPTS• Computer systems organization → Neural networks.
Non-volatile memory (NVM)-based training-inmemory (TIME) systems have emerged that can process the NN training in an energy-efficient manner. However, the endurance of NVM cells is disappointing, rendering concerns about the lifetime of TIME systems, because the weights of NN models always need to be updated for thousands to millions of times during training. Gradient sparsification (GS) can alleviate this problem by preserving only a small portion of the gradients to update the weights. However, conventional GS will introduce non-uniform writes on different cells across the whole NVM crossbars, which significantly reduces the excepted available lifetime. Moreover, an adversary can easily launch malicious training tasks to exactly wear-out the target cells and fast break down the system.In this paper, we propose an efficient and effective framework, referred as SGS-ARS, to improve the lifetime and security of TIME systems. The framework mainly contains a structured gradient sparsification (SGS) scheme for reducing the write frequency, and an aging-aware row swapping (ARS) scheme to make the writes uniform. Meanwhile, we show that the backpropagation mechanism allows the attacker to localize and update fixed memory locations and wear them out. Therefore, we introduce Random-ARS and Refresh techniques to thwart adversarial training attacks, preventing the systems from being fast broken in an extremely short time. Our experiments show that when TIME is programmed to train ResNet-50 on Imagenet dataset, 356× lifetime extension can be achieved without sacrificing the accuracy much or incurring much hardware overhead. Under adversarial environment, the available lifetime of TIME systems can still be improved by 84×.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.