We present the use of parallel processors for the solution of drift-diffusion semiconductor device equations using an irregular grid discretization. Preconditioning, partitioning, and communication scheduling algorithms are developed to implement an efficient and robust iterative linear solver with preconditioning. The parallel program is executed on a 64 node CM-5 and is compared with PILS running on a single processor. We observe an efficiency increase in obtaining parallel speed-ups as the problem size increases. We obtain 60% efficiency for CGS with no preconditioning for large problems. Using CGS with processor ILU and magnitude threshold fill-in preconditioning for the CM-5 and CGS with ILU for PILS, we attain 50% efficiency for the solution of the large matrices.
1: IntroductionThe simulation of complex three-dimensional semiconductor devices requires computers with significant computational power. [1,2,3,4] have shown how massively parallel computers can be used efficiently for drift-diffusion device simulation. All these simulators used rectangular grids since they are easy to implement, have perfect load balance, and have regular communication patterns. However, irregular grids are important in the field of device simulations since they allow the modeling of nonrectangular device boundaries and do not require grids for quasi-neutral regions. [5] gives an example of a diagonal alpha particle track that would require 2,000,000 rectangular grid points to model accurately whereas a general irregular grid would only require 6900 grid points to achieve the same accuracy. Even with the reduction of grid points obtained by the use of irregular grids, semiconductor simulation still requires significant computational power. A standard latch-up problem, which requires over 50,000 irregular grid nodes, may take five hours to simulate on vector machines such as the Cray-2 [6]. Other applications such as SOl, parasitic MOSFETs [7], and silicon pixel detectors [8] may require more computational power. Although faster vector supercomputers may offer the computational power needed, parallel processors provide an attractive alternative.We present a Connection Machine 5 (CM-5) [9] device simulator that uses an irregular grid automatically generated by the Omega program [10,11]. For sequential device simulators, the nonlinear algebraic system of equations arising from the discretization is efficiently and accurately solved by a variation of the basic Newton-Raphson algorithm. As usual in this algorithm, most of the computation time is spent on the solution of the linearized system of equations. The focus of our work is to speed-up this step. We present heuristics for partitioning, communication scheduling, and preconditioning for the efficient implementation of a parallel iterative linear solver. Parallel results are compared with a sequential program called PILS [12].This paper is organized as follows. An overview of the device equations and how they are generally solved is first given. Section 3 describes a parallel line...