Effective image and video annotation is a fundamental pillar in computer vision and artificial intelligence, crucial for the development of accurate machine learning models. Object tracking and image retrieval techniques are essential in this process, significantly improving the efficiency and accuracy of automatic annotation. This paper systematically investigates object tracking and image acquisition techniques. It explores how these technologies can collectively enhance the efficiency and accuracy of the annotation processes for image and video datasets. Object tracking is examined for its role in automating annotations by tracking objects across video sequences, while image retrieval is evaluated for its ability to suggest annotations for new images based on existing data. The review encompasses diverse methodologies, including advanced neural networks and machine learning techniques, highlighting their effectiveness in various contexts like medical analyses and urban monitoring. Despite notable advancements, challenges such as algorithm robustness and effective human-AI collaboration are identified. This review provides valuable insights into these technologies' current state and future potential in improving image annotation processes, even showing existing applications of these techniques and their full potential when combined.