Optimization of BLAS on the Cell Processor

Saxena, Vaibhav; Agrawal, Prashant; Sabharwal, Yogish; Garg, Vijay K.; Kuruvilla, Vimitha A.; Gunnels, John A.

doi:10.1007/978-3-540-89894-8_6

Cited by 7 publications

(2 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Notice that, in the reality, many other architectural details affect the optimal block size, but the size of 64 × 64 is usually selected as the optimal block size for porting the dense linear algebra problems to Cell/B.E. processor [9], [22].…”

Section: ) Optimal Block Sizementioning

confidence: 99%

Matrix Inversion on the Cell/B.E. Processor

Yokoyama

Matsumoto

Sedukhin

2009

2009 11th IEEE International Conference on High Performance Computing and Communications

View full text Add to dashboard Cite

The problem of inverting matrices is one that occurs in some problems of practical importance. This paper introduces and evaluates the block algorithm for high performance matrix inversion on the Cell Broadband Engine (Cell/B.E.) processor. The Cell/B.E. is a heterogeneous multi-core processor on a singlechip jointly developed by Sony, Toshiba and IBM, which has a very high speed of the single precision floating-point arithmetic. The discussed matrix inversion algorithm is a combination of the block Algebraic Path Problem algorithm and the well-known block matrix inversion algorithm based on the LU decomposition. For relatively big matrices, this combined block algorithm spends the most time in computing matrix-matrix multiplication of blocks and achieves 149.4 Gflop/s on Cell/B.E., when PPE and six SPEs of PlayStation3 are used, or 93.4% of the aggregated double (PPE) and single (SPEs) precision peak performance, which is 160.0 Gflop/s.

show abstract

Section: ) Optimal Block Sizementioning

confidence: 99%

Matrix Inversion on the Cell/B.E. Processor

Yokoyama

Matsumoto

Sedukhin

2009

2009 11th IEEE International Conference on High Performance Computing and Communications

View full text Add to dashboard Cite

show abstract

“…It should be noted that the default Octave installation utilises hardware-specific BLAS libraries which are provided with the IBM Cell SDK. These libraries are highly optimised for the Cell architecture [46] and can utilise both Cell processors available in the QS22 (a total of 16 SPEs).…”

Section: Speedups and Scalabilitymentioning

confidence: 99%

Accelerating the Execution of Matrix Languages on the Cell Broadband Engine Architecture

Khoury

Burgstaller

Scholz

2011

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Matrix languages, including MATLAB and Octave, are established standards for applications in science and engineering. They provide interactive programming environments that are easy to use due to their scripting languages with matrix data types. Current implementations of matrix languages do not fully utilise high-performance, special-purpose chip architectures such as the IBM PowerXCell processor (Cell), which is currently used in the fastest computer in the world.We present a new framework that extends Octave to harness the computational power of the Cell. With this framework the programmer is relieved of the burden of introducing explicit notions of parallelism. Instead the programmer uses a new matrix data-type to execute matrix operations in parallel on the synergistic processing elements (SPEs) of the Cell. We employ lazy evaluation semantics for our new matrix data-type to obtain execution traces of matrix operations. Traces are converted to data dependence graphs; operations in the data dependence graph are lowered (split into sub-matrices), scheduled and executed on the SPEs. Thereby we exploit (1) data parallelism, (2) instruction level parallelism, (3) pipeline parallelism and (4) task parallelism of matrix language programs. We conducted extensive experiments to show the validity of our approach. Our Cellbased implementation achieves speedups of up to a factor of 12 over code run on recent Intel Core2 Quad processors.

show abstract