Majedul Haque Sujon scite author profile

Force-directed algorithms are widely used to generate aestheticallypleasing layouts of graphs or networks arisen in many scientific disciplines. To visualize large-scale graphs, several parallel algorithms have been discussed in the literature. However, existing parallel algorithms do not utilize memory hierarchy efficiently and often offer limited parallelism. This paper addresses these limitations with BatchLayout, an algorithm that groups vertices into minibatches and processes them in parallel. BatchLayout also employs cache blocking techniques to utilize memory hierarchy efficiently. More parallelism and improved memory accesses coupled with force approximating techniques, better initialization, and optimized learning rate make BatchLayout significantly faster than other state-of-the-art algorithms such as ForceAtlas2 and OpenOrd. The visualization quality of layouts from BatchLayout is comparable or better than similar visualization tools. All of our source code, links to datasets, results and log files are available at https://github.com/khaled-rahman/BatchLayout.

show abstract

Studying the impact of application-level optimizations on the power consumption of multi-core architectures

Rahman

Guo

Bhat

et al. 2012

View full text Add to dashboard Cite

This paper studies the overall system power variations of two multi-core architectures, an 8-core Intel and a 32-core AMD workstation, while using these machines to execute a wide variety of sequential and multi-threaded benchmarks using varying compiler optimization settings and runtime configurations. Our extensive experimental study provides insights for answering two questions: 1) what degrees of impact can application level optimizations have on reducing the overall system power consumption of modern CMP architectures; and 2) what strategies can compilers and application developers adopt to achieve a balanced performance and power efficiency for applications from a variety of science and embedded systems domains.

show abstract

FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks

Rahman

Sujon

Azad

2021

View full text Add to dashboard Cite

We develop a fused matrix multiplication kernel that unifies sampled dense-dense matrix multiplication and sparsedense matrix multiplication under a single operation called FusedMM. By using user-defined functions, FusedMM can capture almost all computational patterns needed by popular graph embedding and GNN approaches.FusedMM is an order of magnitude faster than its equivalent kernels in Deep Graph Library. The superior performance of FusedMM comes from the low-level vectorized kernels, a suitable load balancing scheme and an efficient utilization of the memory bandwidth. FusedMM can tune its performance using a code generator and perform equally well on Intel, AMD and ARM processors. FusedMM speeds up an end-to-end graph embedding algorithm by up to 28× on different processors.

show abstract

Empirically Tuning HPC Kernels with iFKO

Sujon¹

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.