Visual tracking is an open and exciting field of research. The researchers introduced great efforts to be close to the ideal state of stable tracking of objects regardless of different appearances or circumstances. Owing to the attractive advantages of generative adversarial networks (GANs), they have been a promising area of research in many fields. However, GAN network architecture has not been thoroughly investigated in the visual tracking research community. Inspired by visual tracking via adversarial learning (VITAL), we present a novel network to generate randomly initialized masks for building augmented feature maps using multilayer perceptron (MLP) generative models. To obtain more robust tracking these augmented masks can extract robust features that do not change over a long temporal span. Some models such as deep convolutional generative adversarial networks (DCGANs) have been proposed to obtain powerful generator architectures by eliminating or minimizing the use of fully connected layers. This study demonstrates that the use of MLP architecture for the generator is more robust and efficient than the convolution-only architecture. Also, to realize better performance, we used one-sided label smoothing to regularize the discriminator in the training stage and the label smoothing regularization (LSR) method to reduce the overfitting of the classifier in the online tracking stage. The experiments show that the proposed model is more robust than the DCGAN model and offers satisfactory performance compared with the stateof-the-art deep visual trackers on OTB-100, VOT2019 and LaSOT datasets.INDEX TERMS Deep learning, generative adversarial network, multilayer perceptron, and visual tracking.
Deep learning algorithms provide visual tracking robustness at an unprecedented level, but realizing an acceptable performance is still challenging because of the natural continuous changes in the features of foreground and background objects over videos. One of the factors that most affects the robustness of tracking algorithms is the choice of network architecture parameters, especially the depth. A robust visual tracking model using a very deep generator (RTDG) was proposed in this study. We constructed our model on an ordinary convolutional neural network (CNN), which consists of feature extraction and binary classifier networks. We integrated a generative adversarial network (GAN) into the CNN to enhance the tracking results through an adversarial learning process performed during the training phase. We used the discriminator as a classifier and the generator as a store that produces unlabeled feature-level data with different appearances by applying masks to the extracted features. In this study, we investigated the role of increasing the number of fully connected (FC) layers in adversarial generative networks and their impact on robustness. We used a very deep FC network with 22 layers as a high-performance generator for the first time. This generator is used via adversarial learning to augment the positive samples to reduce the gap between the hungry deep learning algorithm and the available training data to achieve robust visual tracking. The experiments showed that the proposed framework performed well against state-of-the-art trackers on OTB-100, VOT2019, LaSOT and UAVDT benchmark datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.