Data-intensive applications have drawn more and more attention in the last few years. The basic graph traversal algorithm, the breadth-first search (BFS), a typical data-intensive application, is widely used and the Graph 500 benchmark uses it to rank the performance of supercomputers. The Intel Many Integrated Core (MIC) architecture, which is designed for highly parallel computing, has not been fully evaluated for graph traversal. In this paper, we discuss how to use the MIC to accelerate the BFS. We present some optimizations for native BFS algorithms and develop a heterogeneous BFS algorithm. For the native BFS algorithm, we mainly discuss how to exploit many cores and wide-vector processing units. The performance of our optimized native BFS implementation is 5.3 times that of the highest published performance for graphics processing units (GPU). For the heterogeneous BFS algorithm, the performance of the general processing unit (CPU) and MIC cooperative computing can gain an increase in speed of approximately 1.4 times than that of a CPU for graphs with 2M vertices. This work is valuable for using a MIC to accelerate the BFS. It is also a general guidance for a MIC used for data-intensive applications.
Data-intensive applications draw more and more attentions in the last few years. The breadth-first search (BFS), a typical data-intensive application, is so widely used that the Graph 500 benchmark uses it to rank supercomputers' performance. The Intel MIC (Many Integrated Core), which is designed for highly parallel computing, hasn't been fully evaluated for data-intensive applications. In this paper, we discuss how to use MIC to accelerate the BFS. Optimizations both for native mode and for offload mode are discussed. About native mode, we propose optimizations for threadlevel and data-level parallelism. We exploit the thread-level parallelism by relaxing inter-thread dependence. The optimized algorithm is proved to be more scalable. Data-level parallelism is exploited by 512-bits single instruction multiple data (SIMD) instructions. The maximum speedup we further gain is up to 3.4 times. About offload mode, we present an offload algorithm. By careful task partition and communication optimizations, it can gain speedup for large graphs which can't run natively on MIC as the limited memory size. We believe that the work is valuable for using MIC to accelerate the BFS. Meanwhile, it's a general evaluation of the MIC for data-intensive applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.