The design and implementation of zero copy MPI using commodity hardware with a high performance network

Medical Image Computing and Computer-Assisted Intervention – MICCAI 2004

Sato

et al. 2004

Abstract. This paper presents the design and implementation of a range of motion estimation method that is capable of fine-grained estimation during total hip replacement (THR) surgery. Our method combines an adaptive refinement strategy with a high performance computing system in order to enable real-time estimation. The experimental results indicate that the implementation on a cluster of 64 PCs enables intraoperative estimation of 360 × 360 × 180 stance configurations within a half minute, and thereby plays a key role in selecting and aligning the optimal combination of artificial joint components during THR surgery.

Section: Resultsmentioning

confidence: 99%

Real-Time Estimation of Hip Range of Motion for Total Hip Replacement Surgery

Kawasaki

Medical Image Computing and Computer-Assisted Intervention – MICCAI 2004

Sato

et al. 2004

“…To evaluate our parallel algorithm, we have implemented our algorithm on a cluster of PCs by using the C++ language and MPICH-SCore library [10], which is a fast implementation of the Message Passing Interface (MPI) standard [11]. Our cluster consists of 64 symmetric multiprocessor (SMP) nodes.…”

Section: Resultsmentioning

confidence: 99%

Design and Implementation of Parallel Nonrigid Image Registration Using Off-the-Shelf Supercomputers

Lecture Notes in Computer Science

Ooyama

Takeuchi

et al. 2003

Abstract. This paper presents a new parallel algorithm for nonrigid image registration using off-the-shelf supercomputers, or clusters of PCs. Our algorithm realizes scalable registration for high resolution three-dimensional (3-D) images by employing three techniques: (1) data distribution; (2) data-parallel processing; and (3) dynamic load balancing. The experimental results show that our parallel implementation on a cluster of 64 off-the-shelf PCs (with 128 processors) registers liver CT images of 512×512×159 voxels within 8 minutes while a sequential implementation takes 12 hours. Furthermore, our implementation allows processors to use less memory, and thereby enables us to align 1024×1024×590 voxel images, which is not easy for single processor systems due to the restrictions on the memory space and the processing time.

“…• Reduction amount of trace size; In these studies, we used the MPICH-SCore [18] and MPE [5] libraries, a fast MPI implementation and its trace generation library, respectively. MPE allows us to visualize a trace by using Jumpshot [8], a visualization tool widely used and distributed with MPE.…”

Section: Case Studiesmentioning

confidence: 99%

Trace reduction for performance improvement assessment of message passing parallel programs

Systems & Computers in Japan

Kanbe

Okita

et al. 2006

Abstract-This paper proposes a trace reduction method for assessing the improvability of the performance of message passing parallel programs. This assessment is based on a what-if prediction approach that forecasts future program performance, for example, the execution time if the target program is modified according to typical tuning techniques. Our method reduces the size of trace files by aggregating records of communications that do not change the predicted execution time. In order to avoid recording such useless information, our method automatically identifies them during program execution by comparing the occurrence time of sends and receives. In case studies, our method reduces the analysis time for what-if predictions as well as the size of trace files roughly into half. We also discuss the usability of our method.