Automatic image annotation is a key technology in image understanding and pattern recognition, and is becoming increasingly important in order to annotate large-scale images. In the past decade, the nearest neighbor model-based AIA methods have been proved to be the most successful in all classical models. This model has four major challenges including semantic gap, label-imbalance, wider range labels, and weak-labeling. In this paper, we propose a novel annotation model based on three-pass KNN (k-Nearest Neighbor) to address the aforementioned challenges. The key idea is to identify appropriate neighbors at each pass KNN. In the first pass KNN, we identify the several most relevant categories based on label feature rather than visual feature as traditional models. In the second pass KNN, we determine the relevant images based on multi-modal (visual and textual label) embedding features. As the test image has not been annotated with any label, we propose a pre-annotation strategory before image annotation to improve the semantic level. In the third pass KNN, we capture relevant labels from semantically and visually similar images and propagate them to the given unlabeled image. In contrast with traditional nearest neighbor based methods, our method can inherently alleviate the problems of semantic gap, label-imbalance, and wider range labels. In addition, to alleviate the issue of weak-labeling, we propose label refinement for training images. Extensive experiments on three classical benchmark datasets demonstrate that the proposed method significantly outperforms the state-of-the-art in terms of per-label and per-image metrics.INDEX TERMS Automatic image annotation, semantic gap, nearest neighbor, weak-labeling.