“…This method received much attention and since its introduction several modifications and extensions have been proposed [22,12,4,5]; furthermore, the data used by Duygulu et al have become a benchmark for comparing image annotation methods [18,15,14]. Several successful semi-supervised methods have been proposed 1 [6,1,2,4,11,16,5], some of which outperform the previous work [11]. The intuitive idea in most of these methods is to introduce latent variables for modeling the joint (or conditional) probability of words and regions.…”