Few-shot learning researches to learn a novel concept from a handful of labeled samples. Due to the small amount of training data, deep network has the risk of over-fitting. Although many previous approaches based on metric criterion can make significant progress to tackle this challenge, they not only ignore the association between query set and support set when learning sample representation, but also fail to focus greater attention in the target area. To cope with these issues, we propose a novel feature transformation network (FTN) for few-shot image classification. Specifically, to draw inferences about other instances from only a few examples, it is expected to learn a model that has more discriminative representation of the target attributes and robust generalization ability. To this end, we introduce an attention-based affinity matrix to transform the semantical enhanced embedding vectors of query samples by associating the support set, thereby guiding the network to learn a sample representation that embodies higher semantic information in the target area. Furthermore, aiming at highlighting the object region in the feature maps, and strengthening the pertinence of similarity measurement between samples, a global and local feature fusion module is designed to fuse the support set samples features. The comprehensive experiments validate the doable of our model, and our method achieves the state-of-the-art performance on two public benchmark datasets, namely, general object dataset mini-ImageNet and fine-grained dataset Caltech-UCSD Birds-200-2011 (CUB). INDEX TERMS Few-shot learning, feature transformation, feature fusion, metric criterion.