The EM algorithm for PET image reconstruction has two major drawbacks that have impeded the routine use of the EM algorithm: the long computation time due to slow convergence and a large memory required for the image, projection, and probability matrix. An attempt is made to solve these two problems by parallelizing the EM algorithm on multiprocessor systems. An efficient data and task partitioning scheme, called partition-by-box, based on the message passing model is proposed. The partition-by-box scheme and its modified version have been implemented on a message passing system, Intel iPSC/2, and a shared memory system, BBN Butterfly GP1000. The implementation results show that, for the partition-by-box scheme, a message passing system of complete binary tree interconnection with fixed connectivity of three at each node can have similar performance to that with the hypercube topology, which has a connectivity of log(2) N for N PEs. It is shown that the EM algorithm can be efficiently parallelized using the (modified) partition-by-box scheme with the message passing model.
A bst 1: actParticle-in-cell (PIC) is a simulation method widely used in many important scientific areas; such as plasma physics, semiconductor device physics, global climate modeling, and galaxy dynamics. In general, these simulations are extremely computationally intensive and, therefore, very time consuming even on supercomputers. We consider a PIC algorithm which simulates the behavior of charged particles in an electromagnetic field. This study is performed in order to explore parallel processing issues, such as relationships between speedup and problem partitioning schemes, problem size and time duration of each iteration for PIC method on different multiprocessors. A new partitioning scheme, hybrid partitioning, is introduced.Hybrid partitioning has evolved out of two general approaches to PIC problem decomposition on multiprocessors, partitioning particles and partitioning the space. We chose the shared memory multiprocessor environment for analyzing our parallel (distributed computing) algorithms. Two different BBN Butterfly machines (GPlOOO and TC2OOO) were employed as testbeds.
A feasible form of parallel architecture would be one which consists of several pipeline stages each of which is a multiprocessor module of a large number of processing elements (PE's). In many applications like real-time image processing, dynamic control, etc., the optimized computing structure would be in this form. In this study, the performance of a parallel processing model of such an organization has been analyzed. In particular, the eflect of interstage communication on throughput of the model has been investigated to suggest an eficient way of transferring data between stages. Parallel Processing ModelA generic representation of a task may be given as follows:where/ is a processing function to be applied to an input ata I and 0 an output result.A task may be partitioned in both spatial and temporal dimensions. That is, a processing function can be decomposed into several subfunctions which are sequentially applied to data, i.e., pipelining, and the input data to each stage may be partitioned for multiprocessing when data parallelism is exploitable in the sta e. Therefore, assuming I< pipeline stages and Mi P E s in the i-th stage [l], 5 where C represents composition, and f ! and I,, are the processing function and the input data of the j-th partition of multiprocessing in the i-th stage of the pipeline, respective1 . Based on Eq. (2!, a parallel processin model of i elined multiprocessor modules may be formulated bf For this model, let t,,, denote computation time of the j-th partition in the i-th stage, and t,, communication time between stage i and stage i + 1 In an earlier work 11, throughput of the parallel case when there is only one P E in each stage. Some of the important observations are: i) The throughput decreases almost linearly as the variation of computation time increases, ii) The closer any two stages with processing model has f, een analyzed in detail for the a larger (than the other stages) variation of computation time, the greater the decrease in the throughput, and iii) the throughput can be significantly improved by increasing the number of buffers in front of and behind the stage with a largest variation of computation time. Multiple-PE-Stage PipelineIn this paper, we consider the case where each sta e of the parallel processing model consists of multije PE's. Inter-stage communication is a data transfer from multiple PE's to multiple PE's.One major factor affecting system throu hput considerably is the number of physical channefs between successive stages, denoted by n,. A larger number of channels would generally achieve a higher throughput since less channel contention is expected. But, a more meaningful approach would be to find the optimum number of channels from the viewpoint of costeffectiveness. Another factor which is to be considered in this analysis is the number of buffers between successive stages, more specifically, the number of buffers per channel, b,.System performance also depends on the characteristics of a task to be executed. Therefore, we cannot carry out perfo...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.