In cross-domain hyperspectral image (HSI) classification, the labeled samples of the target domain are very limited, and it is a worthy attention to obtain sufficient class information from the source domain to categorize the target domain classes (both the same and new unseen classes). This article investigates this problem by employing few-shot learning (FSL) in a meta-learning paradigm. However, most existing crossdomain FSL methods extract statistical features based on convolutional neural networks (CNNs), which typically only consider the local spatial information among features, while ignoring the global information. To make up for these shortcomings, this paper proposes novel convolutional transformer-based few-shot learning (CTFSL). Specifically, FSL is first performed in the classes of source and target domains simultaneously to build the consistent scenario. Then, a domain aligner is set up to map the source and target domains to the same dimensions. In addition, a convolutional transformer (CT) network is utilized to extract local-global features. Finally, a domain discriminator is executed subsequently that can not only reduce domain shift, but also distinguish from which domain a feature originates. Experiments on three widely used hyperspectral image datasets indicate that the proposed CTFSL method is superior to the state-of-the-art cross-domain FSL methods and several typical HSI classification methods in terms of classification accuracy.