Road detection is a crucial part of the autonomous driving system, and semantic segmentation is used as the default method for this kind of task. However, the descriptive categories of agroforestry are not directly definable and constrain the semantic segmentation-based method for road detection. This paper proposes a novel road detection approach to overcome the problem mentioned above. Specifically, a novel two-stage method for road detection in an agroforestry environment, namely ARDformer. First, a transformer-based hierarchical feature aggregation network is used for semantic segmentation. After the segmentation network generates the scene mask, the edge extraction algorithm extracts the trail’s edge. It then calculates the periphery of the trail to surround the area where the trail and grass are located. The proposed method is tested on the public agroforestry dataset, and experimental results show that the intersection over union is approximately 0.82, which significantly outperforms the baseline. Moreover, ARDformer is also effective in a real agroforestry environment.
The use of deep neural networks has revolutionised object tracking tasks, and Siamese trackers have emerged as a prominent technique for this purpose. Existing Siamese trackers use a fixed template or template updating technique, but it is prone to overfitting, lacks the capacity to exploit global temporal sequences, and cannot utilise multi‐layer features. As a result, it is challenging to deal with dramatic appearance changes in complicated scenarios. Siamese trackers also struggle to learn background information, which impairs their discriminative ability. Hence, two transformer‐based modules, the Spatio‐Temporal Fusion (ST) module and the Discriminative Enhancement (DE) module, are proposed to improve the performance of Siamese trackers. The ST module leverages cross‐attention to accumulate global temporal cues and generates an attention matrix with ST similarity to enhance the template's adaptability to changes in target appearance. The DE module associates semantically similar points from the template and search area, thereby generating a learnable discriminative mask to enhance the discriminative ability of the Siamese trackers. In addition, a Multi‐Layer ST module (ST + ML) was constructed, which can be integrated into Siamese trackers based on multi‐layer cross‐correlation for further improvement. The authors evaluate the proposed modules on four public datasets and show comparative performance compared to existing Siamese trackers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.