With impressive results in applications relying on feature learning, deep learning has also blurred the line between algorithm and data. Pick a training dataset, pick a backbone network for feature extraction, and voilà ; this usually works for a variety of use cases. But the underlying hypothesis that there exists a training dataset matching the use case is not always met. Moreover, the demand for interconnections regardless of the variations of the content calls for increasing generalization and robustness in features.An interesting application characterized by these problematics is the connection of historical and cultural databases of images. Through the seemingly simple task of instance retrieval, we propose to show that it is not trivial to pick features responding well to a panel of variations and semantic content. Introducing a new enhanced version of the ALEGORIA benchmark, we compare descriptors using the detailed annotations. We further give insights about the core problems in instance retrieval, testing four state-of-the-art additional techniques to increase performance.