Recent advances in Graph Neural Networks (GNNs) have achieved superior results in many challenging tasks, such as few-shot learning. Despite its capacity to learn and generalize a model from only a few annotated samples, GNN is limited in scalability, as deep GNN models usually suffer from severe over-fitting and over-smoothing. In this work, we propose a novel GNN framework with a triple-attention mechanism, i.e., node self-attention, neighbor attention, and layer memory attention, to tackle these challenges. We provide both theoretical analysis and illustrations to explain why the proposed attentive modules can improve GNN scalability for few-shot learning tasks. Our experiments show that the proposed Attentive GNN model outperforms the state-of-the-art few-shot learning methods using both GNN and non-GNN approaches. The improvement is consistent over the mini-ImageNet, tiered-ImageNet, CUB-200-2011, and Flowers-102 benchmarks, using both ConvNet-4 and ResNet-12 backbones, and under both the inductive and transductive settings. Furthermore, we demonstrate the superiority of our method for few-shot fine-grained and semi-supervised classification tasks with extensive experiments. The code for this work is publicly available at https://github.com/chenghao-ch94/AGNN.