We briefly describe the Poor Man's Supercomputer (PMS) project carried out at Eötvös University, Budapest. The goal was to construct a cost effective, scalable, fast parallel computer to perform numerical calculations of physical problems that can be implemented on a lattice with nearest neighbour interactions. To this end we developed the PMS architecture using PC components and designed a special, low cost communication hardware and the driver software for Linux OS. Our first implementation of PMS includes 32 nodes (PMS1). The performance of PMS1 was tested by Lattice Gauge Theory simulations. Using pure SU(3) gauge theory or the bosonic MSSM on PMS1 we obtained 3$/Mflop and 0.45$Mflop price-to-sustained performance ratio for double and single precision operations, respectively. The design of the special hardware and the communication driver are freely available upon request for non-profit organizations.
IntroductionOur purpose was to build a high performance supercomputer from PC elements. We use PCs for two reasons. They have excellent cost/performance ratios [1] and can easily be upgraded when faster motherboards and CPUs will be available. The PMS project started in 1998, and the machine is now ready for physical calculations. Our first PMS machine (PMS1) consists of 32 PCs arranged in a three-dimensional 2 × 4 × 4 mesh. Each node has two special communication cards providing fast communication through flat cables to the six neighbours. This gives a much better performance than simple Ethernet link.Since the machine is built from PCs, the latest versions of all programming languages (such as C and Fortran) can be used for coding. Writing applications is straightforward. One only has to keep in mind the 3-dimensional mesh structure of the machine; no further deep understanding of how the communication works is required. There are some routines written in C that make communication of data between the nodes easy. The machine works in Single Instruction Multiple Data (SIMD) mode: all processors execute the same program, while the data they work on may differ.Nowadays double precision floating point arithmetic is necessary for accurate results. PMS offers this precision since the processors have double precision Floating Point Units (FPUs). In cases when single precision is enough, the special MMX instruction set of AMD K6-2 processors can be used. This provides a much higher performance, in principle 8 times higher, than the standard double precision mode.The following sections describe the hardware and software architectures of PMS. We first give a short overview of the machine and then describe the hardware and the software in more detail. Some performance results are also presented, and an outlook is given.