Multi-object tracking (MOT) is one of the significant directions of computer vision. Though existing methods can solve simple tasks like pedestrian tracking well, some complex downstream tasks featuring uniform appearance and diverse motion remain difficult. Inspired by DETR, the tracking-by-attention (TBA) method uses transformers to accomplish multi-object tracking tasks. However, there are still issues with existing TBA methods within the TBA paradigm, such as difficulty detecting and tracking objects due to gradient conflict in shared parameters, and insufficient use of features to distinguish similar objects. We introduce FusionTrack to address these issues. It utilizes a joint track-detection decoder and a score-guided multi-level query fuser to enhance the usage of information within and between frames. With these improvements, FusionTrack achieves 11.1% higher by HOTA metric on the DanceTrack dataset compared with the baseline model MOTR.
<p>The frequency domain plays a crucial role in image processing. However, modern neural networks, such as Convolution Neural Networks and Transformers, only operate in the temporal domain, resulting in a contradiction concerning information aggregation. In contrast, the frequency domain has distinct advantages to solving the contradiction. In this paper, we introduce a frequency-based neural network architecture with joint temporal and frequency domains named as FrequentNet. We analyze the challenges in frequency-based neural networks associated with combining temporal and frequency domain information. Moreover, we find that the absence of frequency-domain downsampling methods and complex computations also affect the frequency models' performance. To tackle the abovementioned problems, we introduce a residual connection that separates the temporal and frequency domains to resolve information aliasing. Furthermore, we devise a frequency domain down-sampling method based on the mapping. Finally, we use Discrete Cosine Transform as the frequency domain transformation operator to avoid the need for complex computations. Comprehensive experiments demonstrate that our approach surpasses existing frequency-based backbones in diverse fields, including image classification, object detection, and semantic segmentation, whose superiority stems from the frequency domain's robust and efficient information aggregation capability.</p>
<p>The frequency domain plays a crucial role in image processing. However, modern neural networks, such as Convolution Neural Networks and Transformers, only operate in the temporal domain, resulting in a contradiction concerning information aggregation. In contrast, the frequency domain has distinct advantages to solving the contradiction. In this paper, we introduce a frequency-based neural network architecture with joint temporal and frequency domains named as FrequentNet. We analyze the challenges in frequency-based neural networks associated with combining temporal and frequency domain information. Moreover, we find that the absence of frequency-domain downsampling methods and complex computations also affect the frequency models' performance. To tackle the abovementioned problems, we introduce a residual connection that separates the temporal and frequency domains to resolve information aliasing. Furthermore, we devise a frequency domain down-sampling method based on the mapping. Finally, we use Discrete Cosine Transform as the frequency domain transformation operator to avoid the need for complex computations. Comprehensive experiments demonstrate that our approach surpasses existing frequency-based backbones in diverse fields, including image classification, object detection, and semantic segmentation, whose superiority stems from the frequency domain's robust and efficient information aggregation capability.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.