Iterative reconstruction algorithms are becoming increasingly important in electron tomography of biological samples. These algorithms, however, impose major computational demands. Parallelization must be employed to maintain acceptable running times. Graphics Processing Units (GPUs) have been demonstrated to be highly cost-effective for carrying out these computations with a high degree of parallelism. In a recent paper by Xu et al.[1], a GPU implementation strategy was presented that obtains a speedup of an order of magnitude over a previously proposed GPU-based electron tomography implementation. In this technical note, we demonstrate that by making alternative design decisions in the GPU implementation, an additional speedup can be obtained, again of an order of magnitude. By carefully considering memory access locality when dividing the workload among blocks of threads, the GPU's cache is used more efficiently, making more effective use of the available memory bandwidth.Keywords: Electron Tomography, Reconstruction, GPU Recently, iterative algebraic methods, such as ART and SIRT, have gained popularity in the electron tomography community due to their flexibility with respect to the geometric parameters of the tilt series, and their ability to handle noisy projection data. The use of algebraic reconstruction methods imposes major computational demands. Depending on the number of iterations, reconstructing a large 3D volume with a sequential implementation can easily take days on a normal PC. This obstacle can be largely overcome by parallelizing the computations, in particular the projection and backprojection steps. Graphics Processing Units (GPUs) have recently emerged as powerful parallel processors for general-purpose computations. Their architecture allows operations to be performed on a large number of data elements simultaneously.Several algorithmic strategies have already been proposed for implementing algebraic methods for electron tomography on the GPU. In [2], it was demonstrated by Castaño-Diez et al. that at that time, a GPU implementation of the SIRT algorithm could achieve similar performance to a CPU implementation running on a medium sized cluster. Xu et al. recently proposed a different implementation strategy [1] that leads to a speedup of an order of magnitude compared to the results from [2]. They attribute this speedup to improvements in three categories: minimizing synchronization overhead, encouraging latency hiding, and exploiting RGBA channel parallelism. The first two design goals are interdependent and cannot be optimized separately. In this technical note, we argue that by exploiting data locality more effectively, the runtime of the projection and back projection operations can be substantially reduced, even though the required number of thread synchronization steps will increase. We demonstrate that a significant speedup can be gained in this manner. Exploiting data localityThe Graphics Processing Unit (GPU) is well suited for carrying out the computations involved in ...
Colloidal core-shell semiconductor nanocrystals form an important class of optoelectronic materials, in which the exciton wave functions can be tailored by the atomic configuration of the core, the interfacial layers, and the shell. Here, we provide a trustful 3D characterization at the atomic scale of a free-standing PbSe(core)-CdSe(shell) nanocrystal by combining electron microscopy and discrete tomography. Our results yield unique insights for understanding the process of cation exchange, which is widely employed in the synthesis of core-shell nanocrystals. The study that we present is generally applicable to the broad range of colloidal heteronanocrystals that currently emerge as a new class of materials with technological importance.
Diffraction contrast tomography is a near‐field diffraction‐based imaging technique that provides high‐resolution grain maps of polycrystalline materials simultaneously with the orientation and average elastic strain tensor components of the individual grains with an accuracy of a few times 10−4. Recent improvements that have been introduced into the data analysis are described. The ability to process data from arbitrary detector positions allows for optimization of the experimental setup for higher spatial or strain resolution, including high Bragg angles (0 < 2θ < 180°). The geometry refinement, grain indexing and strain analysis are based on Friedel pairs of diffraction spots and can handle thousands of grains in single‐ or multiphase materials. The grain reconstruction is performed with a simultaneous iterative reconstruction technique using three‐dimensional oblique angle projections and GPU acceleration. The improvements are demonstrated with the following experimental examples: (1) uranium oxide mapped at high spatial resolution (300 nm voxel size); (2) combined grain mapping and section topography at high Bragg angles of an Al–Li alloy; (3) ferrite and austenite crystals in a dual‐phase steel; (4) grain mapping and elastic strains of a commercially pure titanium sample containing 1755 grains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.