A Multi-Platform Evaluation of the Randomized CX Low-Rank Matrix Factorization in Spark

Gittens, Alex; Kottalam, Jey; Yang, Jiyan; Ringenburg, Michael F.; Chhugani, Jatin; Racah, Evan; Singh, M.G.; Yao, Y.; Fischer, Curt R.; Ruebel, Oliver; Bowen, Benjamin P.; Lewis, Norman G.; Mahoney, Michael W.; Krishnamurthy, V.; Prabhat,

doi:10.1109/ipdpsw.2016.114

Cited by 5 publications

(2 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the current GstLAL pipeline, streamer multi-media framework has been used and HTC based distributed set-up is developed to run all the steps involved in this pipeline [7]. In recent work, Gittens et al [47] showed the adaptability of RMF algorithm in a Apache-SPARK set-up. SPARK-optimized code with enhanced computation power, making our algorithms much faster.…”

Section: Discussionmentioning

confidence: 99%

Random projections in gravitational-wave searches from compact binaries II: efficient reconstruction of the detection statistic

Reza¹,

Dasgupta²,

Sengupta³

2021

Preprint

View full text Add to dashboard Cite

Low-latency gravitational wave search pipelines such as LLOID take advantage of low-rank factorization of the template matrix via singular value decomposition (SVD). With unprecedented improvements in detector bandwidth and sensitivity in advanced-LIGO and Virgo detectors, one expects several orders of magnitude increase in the size of template banks. This poses a formidable computational challenge in factorizing extremely large matrices. Previously, [in Kulkarni et al. [6]], we introduced the idea of random projection (RP)-based matrix factorization as a computationally viable alternative to SVD, for such large template banks. In this follow-up paper, we demonstrate the application of a block-wise randomized matrix factorization (RMF) scheme using which one can compute the desired low-rank factorization corresponding to a fixed average SNR loss ( δρ/ρ ). Unlike the SVD approach, this new scheme affords a much more efficient way of matrix factorization especially in the context of LLOID search pipelines. It is a well-known fact that for very large template banks, the total computational cost is dominated by the cost of reconstruction of the detection statistic and that the cost of filtering the data is insignificant in comparison. We are unaware of any previous work in literature that has tried to squarely address this issue of optimizing the reconstruction cost. We provide a possible solution to reduce the reconstruction cost using the matching pursuit(MP) algorithm in this paper. We show that it is possible to approximately reconstruct the time-series of the detection statistic at a fraction of the total cost using our MP algorithm. The combination of RMF along with MP can handle large template banks more efficiently in comparison to the direct application of SVD. We have analyzed the total computational cost in detail and offer various tips for optimally applying the RMF scheme in different parts of the parameter space. The algorithms presented in this paper are designed in a suitable manner that can be efficiently implemented over a distributed computing architecture. Results from several numerical simulations have been presented to demonstrate their efficacy.

show abstract

Section: Discussionmentioning

confidence: 99%

Random projections in gravitational-wave searches from compact binaries II: efficient reconstruction of the detection statistic

Reza¹,

Dasgupta²,

Sengupta³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…On the other hand, a non-naïve version of this meta-algorithm is very promising: it gives the best worst-case algorithm in RAM [164,69,71] (using Sketch-and-Solve, described below); it beats LAPACK for high precision in wall-clock time [157,9,134] (using Sketch-and-Precondition, described below); it leads to super-terabyte-scale implementations in parallel/distributed environments [174,85]; and it gives the foundation for low-rank approximations and the rest of RandNLA [72,73,124,68]. Fundamental structural result.…”

Section: Least-squares Approximationmentioning

confidence: 99%

Determinantal Point Processes in Randomized Numerical Linear Algebra

Dereziński¹,

Mahoney²

2021

Notices Amer. Math. Soc.

View full text Add to dashboard Cite

Large matrices arise in many machine learning and data analysis applications, including as representations of datasets, graphs, model weights, and first and second-order derivatives. Randomized Numerical Linear Algebra (RandNLA) is an area which uses randomness to develop improved algorithms for ubiquitous matrix problems. The area has reached a certain level of maturity; but recent hardware trends, efforts to incorporate RandNLA algorithms into core numerical libraries, and advances in machine learning, statistics, and random matrix theory, have lead to new theoretical and practical challenges. This article provides a self-contained overview of RandNLA, in light of these developments.

show abstract

Scaling up data-parallel analytics platforms: Linear algebraic operation cases

Lim

et al. 2017

2017 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

A Multi-Platform Evaluation of the Randomized CX Low-Rank Matrix Factorization in Spark

Cited by 5 publications

References 21 publications

Random projections in gravitational-wave searches from compact binaries II: efficient reconstruction of the detection statistic

Random projections in gravitational-wave searches from compact binaries II: efficient reconstruction of the detection statistic

Determinantal Point Processes in Randomized Numerical Linear Algebra

Scaling up data-parallel analytics platforms: Linear algebraic operation cases

Contact Info

Product

Resources

About