Information cascade size prediction is one of the primary challenges for understanding the diffusion of information. Traditional feature‐based methods heavily rely on the quality of handcrafted features, requiring extensive domain knowledge and hard to generalize to new domains. Recently, inspired by the success of deep learning in computer vision and natural language processing, researchers have developed neural network‐based approaches for tackling this problem. However, existing deep learning‐based methods either focused on modeling the temporal characteristics of cascades but ignored the structural information or failed to take the order‐scale and position‐scale into consideration in modeling structures of information propagation. This paper proposed a novel graph neural network‐based model, called MUCas, to learn the latent representations of cascade graphs from a multi‐scale perspective, which can make full use of the direction‐scale, high‐order‐scale, position‐scale, and dynamic‐scale of cascades via a newly designed MUlti‐scale Graph Capsule Network (MUG‐Caps) and the influence‐attention mechanism. Extensive experiments conducted on two real‐world data sets demonstrate that our MUCas significantly outperforms the state‐of‐the‐art approaches.