Ray tracing has long been considered as the next-generation technology for graphics rendering. Recently, there has been strong momentum to adopt ray tracing--based rendering techniques on consumer-level platforms due to the inability of further enhancing user experience by increasing display resolution. On the other hand, the computing workload of ray tracing is still overwhelming. A 10-fold performance gap has to be narrowed for real-time applications, even on the latest graphics processing units (GPUs). As a result, hardware acceleration techniques are critical to delivering a satisfying level performance while at the same time meeting an acceptable power budget. A large body of research on ray-tracing hardware has been proposed over the past decade. This article is aimed at providing a timely survey on hardware techniques to accelerate the ray-tracing algorithm. First, a quantitative profiling on the ray-tracing workload is presented. We then review hardware techniques for the main functional blocks in a ray-tracing pipeline. On such a basis, the ray-tracing microarchitectures for both ASIC and processors are surveyed by following a systematic taxonomy.
Bayesian computing, including sampling probability distributions, learning graphic model, and Bayesian reasoning, is a powerful class of machine learning algorithms with such wide applications as biologic computing, financial analysis, natural language processing, autonomous driving, and robotics. The central pattern of Bayesian computing is the Markov Chain Monte Carlo (MCMC) computing, which is compute-intensive and lacks explicit parallelism. In this work, we propose a parallel MCMC Bayesian computing accelerator (PMBA) architecture. Designed as a probabilistic computing platform with native support for efficient single-chain parallel Metropolis-Hastings based MCMC sampling, PMBA boosts the performance of probabilistic programs with a massive-parallelism microarchitecture. PMBA is equipped with on-chip random number generators as the built-in source of randomness. The sampling units of PMBA are designed for parallel random sampling through a customized SIMD pipeline supporting data synchronization every iteration. A respective computing framework supporting automatic parallelization and mapping of probabilistic programs is also developed. Evaluation results demonstrate that PMBA enables a 17-21 folds speedup over a TITAN X GPU on MCMC sampling workload. On probabilistic benchmarks, PMBA outperforms prior best solutions by factor of 3.6 to 10.3. An exemplar based visual category learning algorithm is implemented on PMBA to demonstrate its efficiency and effectiveness for complex statistical learning problems. INDEX TERMSAccelerator architectures, Bayesian methods, FPGA, MCMC, Parallel machines I. YUFEI NI received the B.S. degree in electronics engineering from Tsinghua University, China, in 2013. He is currently pursuing the Ph.D. degree in electronics engineering at Institute of Microelectronics, Tsinghua University, China. His research interests include parallel computing architecture and artificial intelligence accelerating methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.