Iterative Sparse Matrix-Vector Multiplication for Integer Factorization on GPUs

Schmidt, Bertil; Aribowo, Hans; Dang, Hoang-Vu

doi:10.1007/978-3-642-23397-5_41

Cited by 7 publications

(9 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As reported in [17], experimental results on GeForce GTX 480 show that SpMV kernel with the cache blocking method is 5x faster than the unblocked CSR kernel in the best case. And in [18], we can see speedups between 4 and 8 on a single GPU for a number of tested NFS matrices compared to an optimized multi-core implementation. We can strongly believe that our algorithm is at least exceeding the conventional CPU algorithms.…”

Section: Cost Analysismentioning

confidence: 87%

“…For the densest part, we use the dense format CSR and SLE for the sparse part to improve the performance. More details can be found in Schmidt's paper [18].…”

Section: B Spare Matrix Formatsmentioning

confidence: 98%

“…The storage format used to store a spare matrix can have a large impact on the performance of the sparse matrixvector multiplication (SpMV) [12,13,14]. There are a multitude of sparse matrix representations, such as the diagonal format (DIA), the coordinate format (COO), the compressed sparse row format (CRS), the Ellpack-Itpack format (ELL), the packet format (PKT) and so on.…”

Section: B Spare Matrix Formatsmentioning

confidence: 99%

See 2 more Smart Citations

An Improved Block Lanczos Algorithm to Solve Large and Sparse Matrixes on GPUs

Ying

2013

2013 Ninth International Conference on Computational Intelligence and Security

View full text Add to dashboard Cite

The security of the RSA cryptosystem is based on the difficulty of integer factorization. The General Number Field Sieve (GNFS) is one of the state-of-the-art algorithms to solve this problem over 110 digits. The Montgomery Block Lanczos algorithm is often used for solving a large and sparse linear system over GF (2) in the GNFS. AS Graphics Processing Units (GPUs) can provide a significant increase in floating point operations and memory bandwidth over conventional Central Processing Units (CPUs), performing sparse matrix-vector multiplications with these co-processors can decrease the amount of time. In this paper, we will first improve the initialization way of the algorithm to avoid sudden breakdown in the very first stage. Because a very high possibility of failure caused by the random initialization way, we will design a pseudo random way to initialize the algorithm to generate more solutions than traditional Block Lanczos algorithm does. Based on massive research about present sparse matrix storage formats, we will parallelize the improved Block Lanczos algorithm using a newly designed hybrid sparse matrix format on GPUs. Finally, we analyze the cost of our algorithm theoretically. From the results, a speedup can be achieved on GPUs according to related experiments.

show abstract

Section: Cost Analysismentioning

confidence: 87%

“…For the densest part, we use the dense format CSR and SLE for the sparse part to improve the performance. More details can be found in Schmidt's paper [18].…”

Section: B Spare Matrix Formatsmentioning

confidence: 98%

Section: B Spare Matrix Formatsmentioning

confidence: 99%

See 1 more Smart Citation

An Improved Block Lanczos Algorithm to Solve Large and Sparse Matrixes on GPUs

Ying

2013

2013 Ninth International Conference on Computational Intelligence and Security

View full text Add to dashboard Cite

show abstract

“…These works have explored implementing efficiently SpMV over real numbers. Schmidt et al [19] proposed an optimized matrix format to accelerate exact SpMV over GF (2), that can be used in the linear algebra step of the Number Field Sieve (NFS) for integer factorization [22]. Boyer et al [8] have adapted SpMV kernels over small finite fields and rings Z/mZ, where they used double-precision floating-point numbers to represent ring elements.…”

Section: Sparse-matrix-vector Product On Gpusmentioning

confidence: 99%

Accelerating Iterative SpMV for the Discrete Logarithm Problem Using GPUs

Jeljeli

2015

Arithmetic of Finite Fields

View full text Add to dashboard Cite

Abstract. In the context of cryptanalysis, computing discrete logarithms in large cyclic groups using index-calculus-based methods, such as the number field sieve or the function field sieve, requires solving large sparse systems of linear equations modulo the group order. Most of the fast algorithms used to solve such systems -e.g., the conjugate gradient or the Lanczos and Wiedemann algorithms -iterate a product of the corresponding sparse matrix with a vector (SpMV). This central operation can be accelerated on GPUs using specific computing models and addressing patterns, which increase the arithmetic intensity while reducing irregular memory accesses. In this work, we investigate the implementation of SpMV kernels on NVIDIA GPUs, for several representations of the sparse matrix in memory. We explore the use of Residue Number System (RNS) arithmetic to accelerate modular operations. We target linear systems arising when attacking the discrete logarithm problem on groups of size 100 to 1000 bits, which includes the relevant range for current cryptanalytic computations. The proposed SpMV implementation contributed to solving the discrete logarithm problem in GF(2 619 ) and GF(2 809 ) using the FFS algorithm.

show abstract

“…Its parallel implementation using the processing power of multi-core CPUs or a single-node GPU too are not too good to be used even for medium-sized matrices of few hundred thousand rows and columns. Matrix data to be processed, received from fields such as climatology, seismology, cryptography [4] and many other fields, range in sizes of multi-million rows and columns and often require real-time processing.…”

Section: Introductionmentioning

confidence: 99%

CUDA-enabled Hadoop cluster for Sparse Matrix Vector Multiplication

Reza

Sinha

Nag

et al. 2015

2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)

View full text Add to dashboard Cite

Compute Unified Device Architecture (CUDA) is an architecture and programming model that allows leveraging the high compute-intensive processing power of the Graphical Processing Units (GPUs) to perform general, non-graphical tasks in a massively parallel manner. Hadoop is an open-source software framework that has its own file system, the Hadoop Distributed File System (HDFS), and its own programming model, the Map Reduce, in order to accomplish the tasks of storage of very large amount of data and their fast processing in a distributed manner in a cluster of inexpensive hardware. This paper presents a model and implementation of a Hadoop-CUDA Hybrid approach to perform Sparse Matrix Vector Multiplication(SpMV) of very large matrices in a very high performing manner. Hadoop is used for splitting the input matrix into smaller sub-matrices, storing them on individual data nodes and then invoking the required CUDA kernels on the individual GPU-possessing cluster nodes. The original SpMV is done using CUDA. Such an implementation has been seen to improve the performance of the SpMV operation over very large matrices by speedup of around 1.4 in comparison to non-Hadoop, single-GPU CUDA implementation.

show abstract

Iterative Sparse Matrix-Vector Multiplication for Integer Factorization on GPUs

Cited by 7 publications

References 13 publications

An Improved Block Lanczos Algorithm to Solve Large and Sparse Matrixes on GPUs

An Improved Block Lanczos Algorithm to Solve Large and Sparse Matrixes on GPUs

Accelerating Iterative SpMV for the Discrete Logarithm Problem Using GPUs

CUDA-enabled Hadoop cluster for Sparse Matrix Vector Multiplication

Contact Info

Product

Resources

About