Deep learning in recent years has drawn much attention for its outperformance in the fields of computer vision, [1] speech recognition, [2] and natural language processing. [3] The implementation of these tasks relies heavily on the use of deep neural networks (DNNs) to model the prior knowledge distribution in massive amounts of data. In most cases, acquiring training data for DNNs is much more important than the DNN models. [4] Nevertheless, processing data with a limited size of training materials is more in line with the human brain, such as a kid can easily distinguish a dog from cats once he is taught what a dog looks like with one sample. But this problem faces great challenges for the mainstream neural networks because DNNs fail to learn the features of a dog in only one picture. Few-shot learning (FSL), a branch of meta-learning, [5] has been accordingly proposed to build reliable machine learning models using a few labeled examples, just like the way humans work. Among the popular FSL algorithms, the use of dynamic external memory combined with neural network control, [6] also known as memory-augmented neural network (MANN), [7] stands out for the outperformance on FSL tasks. The external memory, usually DRAM, [7,8] is used to remember the features of the insufficient samples for a long time and assists in improving the performance of FSL consistently. However, the time delay caused by the data communication between the memory and the running neural network is costly and degrades the performance of data search using external memory. The introduction of emerging non-volatile memory has become a viable solution to this problem due to its non-volatile nature, which greatly improves search speed and shows the potential for lifelong learning tasks.Various non-volatile devices, such as the flash, [9] FeFETs, [10,11] resistive random-access memory (RRAM), [12][13][14] and phase-change memory (PCM), [15] have been used as external memory to accelerate the similarity computation in MANN. Ni et al. [11] proposed a two-FeFET scheme as a ternary content addressable memory (ternary CAM) cell to calculate the Hamming distances of binary vectors for data indexing. However, two problems arising from this framework are the reduced accuracy compared to digital computers and the long encoding bits for the binarized vectors. Then Karunaratne et al. [15] designed a new attention mechanism to improve the accuracy of similarity-based FSL with the cosine distance of binarized vectors on PCMs. The length of the embedding vectors in refs. [11,15] are 512 bits with the Omniglot dataset, leading to a large area and power overhead of external memory. Kazemi et al. [10] developed an analog CAM with FeFET using the structure in ref. [11] and tried to reduce the encoding