Force-directed algorithms are widely used to generate aestheticallypleasing layouts of graphs or networks arisen in many scientific disciplines. To visualize large-scale graphs, several parallel algorithms have been discussed in the literature. However, existing parallel algorithms do not utilize memory hierarchy efficiently and often offer limited parallelism. This paper addresses these limitations with BatchLayout, an algorithm that groups vertices into minibatches and processes them in parallel. BatchLayout also employs cache blocking techniques to utilize memory hierarchy efficiently. More parallelism and improved memory accesses coupled with force approximating techniques, better initialization, and optimized learning rate make BatchLayout significantly faster than other state-of-the-art algorithms such as ForceAtlas2 and OpenOrd. The visualization quality of layouts from BatchLayout is comparable or better than similar visualization tools. All of our source code, links to datasets, results and log files are available at https://github.com/khaled-rahman/BatchLayout.
This paper studies the overall system power variations of two multi-core architectures, an 8-core Intel and a 32-core AMD workstation, while using these machines to execute a wide variety of sequential and multi-threaded benchmarks using varying compiler optimization settings and runtime configurations. Our extensive experimental study provides insights for answering two questions: 1) what degrees of impact can application level optimizations have on reducing the overall system power consumption of modern CMP architectures; and 2) what strategies can compilers and application developers adopt to achieve a balanced performance and power efficiency for applications from a variety of science and embedded systems domains.
We develop a fused matrix multiplication kernel that unifies sampled dense-dense matrix multiplication and sparsedense matrix multiplication under a single operation called FusedMM. By using user-defined functions, FusedMM can capture almost all computational patterns needed by popular graph embedding and GNN approaches.FusedMM is an order of magnitude faster than its equivalent kernels in Deep Graph Library. The superior performance of FusedMM comes from the low-level vectorized kernels, a suitable load balancing scheme and an efficient utilization of the memory bandwidth. FusedMM can tune its performance using a code generator and perform equally well on Intel, AMD and ARM processors. FusedMM speeds up an end-to-end graph embedding algorithm by up to 28× on different processors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.