Fundamental limits for the calculation of scattering corrections within X-ray computed tomography (CT) are found within the independent atom approximation from an analysis of the cross sections, CT geometry, and the Nyquist sampling theorem, suggesting large reductions in computational time compared to existing methods. By modifying the scatter by less than 1 %, it is possible to treat some of the elastic scattering in the forward direction as inelastic to achieve a smoother elastic scattering distribution. We present an analysis showing that the number of samples required for the smoother distribution can be greatly reduced. We show that fixed forced detection can be used with many fewer points for inelastic scattering, but that for pure elastic scattering, a standard Monte Carlo calculation is preferred. We use smoothing for both elastic and inelastic scattering because the intrinsic angular resolution is much poorer than can be achieved for projective tomography. Representative numerical examples are given.
SUMMARYModern graphics cards provide computational capabilities that exceed current CPUs. As one of the computational intensive problems, numerical weather prediction has the opportunity to benefit from the massive number of threads and large memory throughput in the graphics architecture. In this paper, we present the key steps to integrate the Compute Unified Device Architecture (CUDA) programming framework for one key component in numerical weather prediction, the data assimilation algorithm, which incorporates the observational data into the model to produce the best initial condition in the next prediction. The data assimilation algorithm we studied in this paper exhibits good localization and favors parallelism. To maximize the throughput of the graphics card, over a million CUDA threads, global memory coalescing, and fast graphics shared memory are utilized. We also demonstrate the differences in the advancement of GPU architectures from the GTX 200 series to Fermi. The experiments are carried out separately on a GTX 260 (GTX 200 series) and a GTX 460 (Fermi) graphics card. Results show an improvement of 72.1 speedup running on the GTX 260 and 92.7 speedup on the GTX 460. The results provide attractive evidence for applying CUDA GPUs to high demanding scientific computation realms.
Designing applications for scalability is key to improving their performance in hybrid and cluster computing. Scheduling code to utilize parallelism is difficult, particularly when dealing with data dependencies, memory management, data motion, and processor occupancy. The Hybrid Task Graph Scheduler (HTGS) improves programmer productivity when implementing hybrid workflows for multi-core and multi-GPU systems. The Hybrid Task Graph Scheduler (HTGS) is an abstract execution model, framework, and API that increases programmer productivity when implementing hybrid workflows for such systems. HTGS manages dependencies between tasks, represents CPU and GPU memories independently, overlaps computations with disk I/O and memory transfers, keeps multiple GPUs occupied, and uses all available compute resources. Through these abstractions, data motion and memory are explicit; this makes data locality decisions more accessible. To demonstrate the HTGS application program interface (API), we present implementations of two example algorithms: (1) a matrix multiplication that shows how easily task graphs can be used; and (2) a hybrid implementation of microscopy image stitching that reduces code size by ≈ 43% compared to a manually coded hybrid workflow implementation and showcases the minimal overhead of task graphs in HTGS. Both of the HTGS-based implementations show good performance. In image stitching the HTGS implementation achieves similar performance to the hybrid workflow implementation. Matrix multiplication with HTGS achieves 1.3× and 1.8× speedup over the multi-threaded OpenBLAS library for 16k × 16k and 32k × 32k size matrices, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.