Marko Kabic scite author profile

Marko Kabic

4Publications

24Citation Statements Received

102Citation Statements Given

How they've been cited

How they cite others

143

Affiliations

ETH Zurich, Swisscom (Switzerland)

Publications

Order By: Most citations

On the parallel I/O optimality of linear algebra kernels

Kwasniewski

Kabic

Ben-Nun

et al. 2021

View full text Add to dashboard Cite

Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically communication-optimal 2.5D decomposition. We first establish a theoretical framework for deriving parallel I/O lower bounds for linear algebra kernels, and then utilize its insights to derive Cholesky and LU schedules, both communicating N 3 /(P √ M) elements per processor, where M is the local memory size. The empirical results match our theoretical analysis: our implementations communicate significantly less than Intel MKL, SLATE, and the asymptotically communication-optimal CANDMC and CAPITAL libraries. Our code outperforms these state-of-the-art libraries in almost all tested scenarios, with matrix sizes ranging from 2,048 to 524,288 on up to 512 CPU nodes of the Piz Daint supercomputer, decreasing the time-to-solution by up to three times. Our code is ScaLAPACK-compatible and available as an open-source library.

show abstract

Red-blue pebbling revisited

Kwasniewski

Kabic

Besta

et al. 2019

View full text Add to dashboard Cite

COSTA: Communication-Optimal Shuffle and Transpose Algorithm with Process Relabeling

Kabic

Pintarelli

Kozhevnikov

et al. 2021

View full text Add to dashboard Cite

A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies

Hè¹,

Kabic²

2023

Preprint

View full text Add to dashboard Cite

Nomenclature ΣCovariance matrix G Gram/kernel matrix k(•)Kernel function P(•) Probability density P(•)Token mixing process Re(•) Function that extracts the real component of a complex numberElement at ith position of column vector a A * :jColumn vector in jth row of A A i,jElement in ith row jth column ofmatrix of the embedding dimension F s L×L Vandermonde matrix of the sequence dimension W Weight matix learned with element-wise non-linearity (e.g., ReLU, GELU) W C L×L Weight matix of a single convolution kernel W K D×N Weight matix of attention key (for self-attention, N = M ) W Q D×M Weight matix of attention query W V D×M Weight matix of attention value X Resulting tokens with inductive bias introduced into X X L×D Input sequence of length L and embedding dimension D, where L D * Correspondence to

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Marko Kabic

On the parallel I/O optimality of linear algebra kernels

Red-blue pebbling revisited

COSTA: Communication-Optimal Shuffle and Transpose Algorithm with Process Relabeling

A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies

Contact Info

Product

Resources

About