“…The majority of them belong to pair-based functions, e.g., contrastive loss [6,12], triplet loss [7,13], N-pair loss [14], hierarchical triplet loss [15], ranked list loss [16], and multi-similarity loss with general pair weighting [17]. Besides, some loss functions adopt proxy mechanism, such as proxy Neighborhood Component Analysis (ProxyNCA) [18] and proxy anchor [19], to speedup the convergence of model training, where the optimization is carried out on the proxies of triplets, each of which involves an anchor point and similar/dissimilar proxy points. In addition, histogram loss [3] was explored for learning deep embedding by estimating two distributions of similarities for positive and negative sample pairs.…”