The existing global-local multiscale computational methods, using finite element discretization at both the macro-scale and micro-scale, are intensive both in terms of computational time and memory requirements and their parallelization using domain decomposition methods incur substantial communication overhead, limiting their application. We are interested in a class of explicit global-local multiscale methods whose architecture significantly reduces this communication overhead on massively parallel machines. However, a naïve task decomposition based on distributing individual macro-scale integration points to a single group of processors is not optimal and leads to communication overheads and idling of processors. To overcome this problem, we have developed a novel coarse-grained parallel algorithm in which groups of macro-scale integration points are distributed to a layer of processors. Each processor in this layer communicates locally with a group of processors that are responsible for the micro-scale computations. The overlapping groups of processors are shown to achieve optimal concurrency at significantly reduced communication overhead. Several example problems are presented to demonstrate the efficiency of the proposed algorithm.to ensure both the efficiency and the reliability of these methods. While there has been excellent progress in the development of multiscale methods, the issue of efficiency has not received sufficient attention.Scale linking is currently performed using hierarchical [2] and concurrent [2-9] schemes. Global-local type of multiscale methods [10-20] falls within the category of hierarchical multiscale methods where the stress-strain relationship at every integration point of the macro-scale is computed by suitably deforming an associated representative volume element (RVE). The major advantage of this class of methods is the ability to model arbitrary non-linearities at the micro-scale as no a priori constitutive assumption is made at the macro-scale. Whereas finite elements are used to discretize the spatial scale at the macro-scale, a variety of techniques have been used to model the RVE, including traditional finite elements [15,[18][19][20], the Voronoi cell finite element method (VCFEM) [13,14], a crystal plasticity framework [16] and numerical methods based on Fast Fourier Transforms [17,21]. However, a major disadvantage of these fully coupled computational techniques is that they are intensive in terms of processor time and memory requirements.Interesting attempts to improve the computational efficiency include re-formulation of the global problem [22], and incorporation of micro-scale effects directly into the finite element basis functions to capture their effect on the macro-scale [23]. In the latter approach, the construction of the basis functions is fully decoupled from one element to the other and hence it is naturally adapted to massively parallel computing. In [24], structural decomposition-based parallel computation strategy is used for the multiscale computation,...