The structure of a multi-head ensemble has been employed by many algorithms in various applications including deep metric learning. However, their structures have been empirically designed in a simple way such as using the same head structure, which leads to a limited ensemble effect due to lack of head diversity. In this paper, for an elaborate design of the multi-head ensemble structure, we establish design concepts based on three structural factors: designing the feature layer for extracting the ensemblefavorable feature vector, designing the shared part for memory savings, and designing the diverse multiheads for performance improvement. Through rigorous evaluation of variants on the basis of the design concepts, we propose a heterogeneous double-head ensemble structure that drastically increases ensemble gain along with memory savings. In verifying experiments on image retrieval datasets, the proposed ensemble structure outperforms the state-of-the-art algorithms by margins of over 5.3%, 6.1%, 5.9%, and 1.8% in CUB-200, Car-196, SOP, and Inshop, respectively. INDEX TERMS Ensemble learning, multi-head structure, deep metric learning, deep architecture design, image retrieval