Due to rapid growth of the Internet and new scientific/technological advances, there exist many new applications that model data as graphs, because graphs have sufficient expressiveness to model complicated structures. The dominance of graphs in realworld applications demands new graph processing techniques to access large data graphs effectively and efficiently. In this paper, we study a graph pattern matching problem, which is to find all patterns in a large data graph that match a user-given graph pattern. We propose new two-step R-join (reachability join) algorithms with a filter step (R-semijoin) and a fetch step (R-join) by utilizing a new cluster-based join index with graph codes in a relational database context. We also propose two optimization approaches to further optimize sequences of R-joins/R-semijoins. The first approach is based on R-join order selection followed by R-semijoin enhancement, and the second approach is to interleave R-joins with R-semijoins. We conducted extensive performance studies, and confirm the efficiency of our proposed new approaches.Given the transitive closure T C computed, a reachability condition X ,! Y can be processed as an equijoin using the following SQL expression:And the graph pattern matching can be processed using a sequence of equijoins. However, it requests either to compute T C online or to materialize T C by precomputing. Both are infeasible, because the former requests high computational cost, and the latter requests huge space.In this work, instead, we maintain a data graph G D with jAEj labels in a relational database G DB using jAEj relations. In brief, for each label X 2 AE, we create a relation, denoted T X , to maintain the extent of X-labeled nodes in G D . Because transitive closure is essential for processing graph pattern matching, we maintain the transitive closure, T C, using graph coding, called 2-hop labeling [8], in the relations in G DB .A 2-hop labeling is a compressed representation of transitive closure [8], which assigns every node v in graph G D a label LðvÞ ¼ ðL in ðvÞ; L out ðvÞÞ, where L in ðvÞ; L out ðvÞ V ðG D Þ, and u 7 ! v is true if and only if L out ðuÞ \ L in ðvÞ 6 ¼ ;. A 2-hop labeling for G D is derived from a 2-hop cover of G D , that minimizes a set of SðU w ; w; V w Þ, as a set cover problem. Here, w 2 V ðG D Þ is called a center, and U w ; V w V ðG D Þ. SðU w ; w; V w Þ implies that, for every node, u 2 U w and v 2 V w , u 7 ! w and w 7 ! v, and therefore u 7 ! v. C o n s i d e r F i g . 2 , a n e x a m p l e i s SðU w ; w; V w Þ ¼ Sðfb 3 ; b 4 g; c 2 ; fe 0 gÞ. Here, c 2 is the center. It indicates: b 3 7 ! c 2 , b 4 7 ! c 2 , c 2 7 ! e 0 , b 3 7 ! e 0 , and b 4 7 ! e 0 . Several algorithms were proposed to fast compute a 2-hop cover for G D [9], [10], [11], [12] and to maintain such a computed 2-hop cover [10], [13]. Let H ¼ fS w1 ; S w2 ; . . .g be the set of 2-hop cover computed, where S wi ¼ SðU wi ; w i ; V wi Þ and all w i are centers. The 2-hop labeling for a node v is LðvÞ ¼ ðL in ðvÞ; L out ðvÞÞ. Here, L in ðvÞ i...