Graph analytics is extensively used in big-data applications such as social networks, web analysis, bio-informatics, etc. Most graph processing frameworks adopt vertex-centric model due to its ease of use and programming. However, when dealing with asynchronous graph analytics, frameworks based on vertex programming perform inefficiently. The reason is that first, vertex programming must guarantee the sequential consistency, which means frequent use of locks or atomic operations, and second, the algorithms are parallelized in vertex level and latent parallelism of the algorithms cannot be exploited. To improve parallel efficiency of asynchronous graph processing, the Gauss-Seidel style algorithms in particular, this paper proposes a scheduling model using Gauss-Seidel-based matrix computation, which converts the vertex programming into two main matrix operations and then algorithms are parallelized by row and column vectors. Compared to vertex programming, our model parallelizes algorithms in a finer way to exploit more latent parallelism, while retains the ease-of-programming advantage of vertex programming. Instead of using locks to guarantee the sequential consistency, our model uses a hybrid synchronization policy to reduce serializability among threads and overheads of context switching. Furthermore, this model strengthens locality of the program. Experiment results show that our model outperforms vertex-centric asynchronous frameworks in both performance and scalability. Moreover, it even surpasses the matrix-based synchronous framework GraphMat with some non-Gauss-Seidel style algorithms.
KEYWORDSasynchronous graph processing, Gauss-Seidel method, matrix computation
INTRODUCTIONWith the advent of big data era, graph processing plays a vital role in various domains including social networks, 1,2 web analysis, 3,4 bio-informatics, 5 etc. To deal with these application requirements, many graph processing frameworks 6-15 have arisen in recent years, developing several programming models. The vertex-centric programming model, also called ''think like a vertex,'' is the most popular one due to its ease of programming.However, the execution efficiency of the graph analytics frameworks based on the vertex-centric programming model is not satisfied in some cases. For example, these frameworks perform inefficiently when dealing with asynchronous graph analytics, particularly, the Gauss-Seidel style algorithms, a class of algorithms that process graphs in Gauss-Seidel logic and are extensively used in social networks, machine learning, linear systems, etc. Typical Gauss-Seidel style algorithms include label propagation, coordinate descent, graph coloring, and so on, and they are inherently sequential and show poor parallelism. Some frameworks based on vertex-centric programming model support solving the Gauss-Seidel style problems. Thereinto, GraphLab, 9 for instance, is a famous one, which comes up with the round-robin scheduler with edge consistency model for the Gauss-Seidel style algorithms. Nevertheless, it is h...