As an emerging early diagnostic technology for gastrointestinal diseases, confocal
endoscopy lacks large-scale perfect annotated data, leading to a major challenge in learning
discriminative semantic features. So, how should we learn representations without labels or
a few labels? In this paper, we proposed a Feature-Level MixSiam method based on the
traditional siamese network for medical image recognition and applied it to gastrointestinal (GI)
disease classification by learning discriminative features from limited probe-based confocal laser
endoscopy (pCLE) images. The proposed method is divided into two stages: self-supervised
learning and few-shot learning. First, in the self-supervised learning stage, the novel feature level-based feature mixing approach introduced more task-relevant information via regularization,
facilitating the traditional siamese structure can adapt to the large intra-class variance of the pCLE
dataset. Then, in the few-shot learning stage, we adopted the pre-trained model obtained through
self-supervised learning as the base learner in the few-shot learning pipeline, enabling the feature
extractor to learn richer and more transferrable visual representations for rapid generalization
to other pCLE classification tasks when labeled data are limited. On two disjoint pCLE
gastrointestinal image datasets, the proposed method is evaluated. With the linear evaluation
protocol, Feature-Level MixSiam outperforms the baseline by 6% (Top-1) and supervised model
by 2% (Top1), which demonstrates the effectiveness of the proposed feature-level-based feature
mixing method. Furthermore, the proposed method outperforms the previous baseline method
for the few-shot classification task, which can help improve the classification of pCLE images
lacking large-scale annotated data for different stages of tumor development. The method has
been tested in two different datasets and has shown promise as a quantitative tool for assisting
pathologists in the diagnosis of diseases.