Within the realm of computer vision, the task of re-identifying targets across multiple video frames has emerged as a pivotal challenge, particularly in domains like video surveillance, smart transportation systems, and pedestrian flow analytics. Conventional re-identification techniques often grapple with constraints stemming from varying camera perspectives, inconsistent lighting conditions, and prevalent occlusions. Addressing these challenges, this research introduces MVF-Re, a sophisticated re-identification approach that synergizes adaptive attention mechanisms with multi-scale feature fusion. Initially, we architect a deep attention-enhanced feature pyramid network, a pioneering framework that dynamically tailors itself to video frame content, thereby capturing intricate target details. Subsequently, we incorporate a multi-input Siamese network, ensuring the derivation of consistent and resilient feature sets across diverse contexts. To augment feature distinctiveness, we conceptualize a context-sensitive dynamic attention mechanism, adept at judiciously allocating weights to individual video frames. Culminating our approach, we deploy an innovative multi-scale feature fusion methodology, offering a holistic and robust target representation. Empirical evaluations on multiple benchmark datasets underscore the superior performance of our methodology, underscoring its proficiency in multi-frame target re-identification.INDEX TERMS multi-video frame target re-recognition, deep attention mechanism, siamese network, context-aware dynamic attention