“…However, previous methods [5,9,10] mostly use inter-media constraints (such as correlation constraints [10]) and intra-media constraints (such as semantic [5] or reconstruction constraints [9,10]) to build common representations for cross-media retrieval. It is challenging to optimize common representation learning since inter-media and intra-media constraints both need to be considered as objective functions [13,14], which restrains the performance of cross-media retrieval.…”