“…With the rise of deep neural networks, handcrafted features were replaced with CNN features in prototype learning, thus achieving end-to-end integration in deep networks and high precision and robustness in various image processing tasks. The differences between the great variety of existing approaches can be roughly grouped by: i) the number of prototypes used to represent a category (1-per-class [19,20,23,26,28,29,34], n-perclass [25,27,30,31,33], sparse [18]); ii) the distance measure (or measures combination) used to stand for the similarity between each instance-prototype pair (Euclidean distance [18-20, 25, 26, 30, 31, 33], Mahalanobis distance [19,20], Co-variance distance [29], Cosine distance [26], Learned distance [27,28], Hand-designed distance [33,34]); and iii) the approach used for prototype representation (prototype-template image [18,23], mean vector of embedded features [25,26,28,29,31], learned centroid vector [19,20,27,30,34], learned CNN-tensor [33]).…”