A set of level 3 basic linear algebra subprograms

Dongarra, Jack; Croz, J. Du; Hammarling, Sven; Duff, Iain S.

doi:10.1145/77626.79170

Cited by 1,442 publications

(788 citation statements)

References 17 publications

Supporting

Mentioning

778

Contrasting

Unclassified

Order By: Relevance

“…All computations were performed on a Sun Fire X4600 M2 with 16 AMD 2.8GHz cores and 32GB of RAM. The original MATLAB R NTF codes were rewritten in C++ and compiled with several libraries including LAPACK [1], ScaLAPACK [2], BLACS [4], BLAS [5] and MPICH [8].…”

Section: Data and Experimental Resultsmentioning

confidence: 99%

A Parallel Nonnegative Tensor Factorization Algorithm for Mining Global Climate Data

Zhang

Berry

Lamb

et al. 2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Increasingly large datasets acquired by NASA for global climate studies demand larger computation memory and higher CPU speed to mine out useful and revealing information. While boosting the CPU frequency is getting harder, clustering multiple lower performance computers thus becomes increasingly popular. This prompts a trend of parallelizing the existing algorithms and methods by mathematicians and computer scientists. In this paper, we take on the task of parallelizing the Nonnegative Tensor Factorization (NTF) method, with the purposes of distributing large datasets into each cluster node and thus reducing the demand on a single node, blocking and localizing the computation at the maximal degree, and finally minimizing the memory use for storing matrices or tensors by exploiting their structural relationships. Numerical experiments were performed on a NASA global sea surface temperature dataset and result factors were analyzed and discussed.

show abstract

Section: Data and Experimental Resultsmentioning

confidence: 99%

A Parallel Nonnegative Tensor Factorization Algorithm for Mining Global Climate Data

Zhang

Berry

Lamb

et al. 2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…The first strategy is implemented using calls to ScaLAPACK; the second strategy is implemented with calls to LAPACK and BLAS [12]. They compare the strategies using Cholesky factorization on a network of workstations.…”

Section: Literature Surveymentioning

confidence: 99%

HeteroMPI+ScaLAPACK: Towards a ScaLAPACK (Dense Linear Solvers) on Heterogeneous Networks of Computers

Reddy¹,

Lastovetsky

2006

High Performance Computing - HiPC 2006

View full text Add to dashboard Cite

Abstract. The paper presents a tool that ports ScaLAPACK programs designed to run on massively parallel processors to Heterogeneous Networks of Computers. The tool converts ScaLAPACK programs to HeteroMPI programs. The resulting HeteroMPI programs do not aim to extract the maximum performance from a Heterogeneous Networks of Computers but provide an easy and simple way to execute the ScaLAPACK programs on such networks with good performance improvements. We demonstrate the efficiency of the resulting HeteroMPI programs by performing experiments with a matrix multiplication application on a local network of heterogeneous computers.

show abstract

“…Moreover C++ interfaces well with other programming languages, and we can easily encapsulate functions and libraries implemented in other languages. In fact, in future releases we plan to use BLAS [10] and LAPACK [2] libraries for speeding-up the computation: BLAS offers the most efficient routines (implemented in FORTRAN) for vector and matrix multiplications, operations that are fundamental for speeding-up execution both in the forward and in the backward step of the backpropagation algorithm. LAPACK, that uses BLAS for low-level subroutines, offers a set of very efficient implementation of linear algebra functions, such matrix QR decomposition or pseudoinverse matrix calculation.…”

Section: Efficiency Of C++ Librariesmentioning

confidence: 99%

NEURObjects: an object-oriented library for neural network development

Valentini¹,

Masulli²

2002

Neurocomputing

View full text Add to dashboard Cite

NEURObjects is a set of C++ library classes for neural network development, exploiting the potentialities of object-oriented design and programming. The main goal of the library consists in supporting experimental research in neural networks and fast prototyping of inductive machine learning applications. We present NEURObjects design issues, its main functionalities, and programming examples, showing how to map neural network concepts into the design of library classes.

show abstract

A set of level 3 basic linear algebra subprograms

Abstract: This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations that should provide for efficient and portable implementations of algorithms for high-performance computers

Cited by 1,442 publications

References 17 publications

A Parallel Nonnegative Tensor Factorization Algorithm for Mining Global Climate Data

A Parallel Nonnegative Tensor Factorization Algorithm for Mining Global Climate Data

HeteroMPI+ScaLAPACK: Towards a ScaLAPACK (Dense Linear Solvers) on Heterogeneous Networks of Computers

NEURObjects: an object-oriented library for neural network development

Contact Info

Product

Resources

About