Edge detection is significant as the basis of high-level visual tasks. Most encoder-decoder edge detection methods used convolutional neural networks, such as VGG16 or Resnet, as the encoding network. Studies on designing decoding networks have achieved good results. Swin Transformer (Swin) has recently attracted much attention in various visual tasks as a possible alternative to convolutional neural networks. Physiological studies have shown that there are two visual pathways that converge in the visual cortex in the biological vision system, and that complex information transmission and communication is widespread. Inspired by the research on Swin and the biological vision pathway, we have designed a two-pathway encoding network. The first pathway network is the fine-tuned Swin; the second pathway network mainly comprises deep separable convolution. To simulate attention transmission and feature fusion between the first and second pathway networks, we have designed a second-pathway attention module and a pathways fusion module. Our proposed method outperforms the CNN-based SOTA method BDCN on BSDS500 datasets. Moreover, our proposed method and the Transformer-based SOTA method EDTER have their own performance advantages. In terms of FLOPs and FPS, our method has more benefits than EDTER.
Edge detection is of great importance to the middle and high-level vision task in computer vision, and it is useful to improve its performance. This paper is different from previous edge detection methods designed only for decoding networks. We propose a new edge detection network composed of modulation coding network and decoding network. Among them, modulation coding network is the combination of modulation enhancement network and coding network designed by using the self-attention mechanism in Transformer, which is inspired by the selective attention mechanism of V1, V2, and V4 in biological vision. The modulation enhancement network effectively enhances the feature extraction ability of the encoding network, realizes the selective extraction of the global features of the input image, and improves the performance of the entire model. In addition, we designed a new decoding network based on the function of integrating feature information in the IT layer of the biological vision system. Unlike previous decoding networks, it combines top-down decoding and bottom-up decoding, uses down-sampling decoding to extract more features, and then achieves better performance by fusing up-sampling decoding features. We evaluated the proposed method experimentally on multiple publicly available datasets BSDS500, NYUD-V2, and barcelona images for perceptual edge detection (BIPED). Among them, the best performance is achieved on the NYUD and BIPED datasets, and the second result is achieved on the BSDS500. Experimental results show that this method is highly competitive among all methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.