Image inpainting refers to filling in unknown regions with known knowledge, which is in full flourish accompanied by the popularity and prosperity of deep convolutional networks. Current inpainting methods have excelled in completing small-sized corruption or specifically masked images. However, for large-proportion corrupted images, most attention-based and structure-based approaches, though reported with state-of-the-art performance, fail to reconstruct high-quality results due to the short consideration of semantic relevance. To relieve the above problem, in this paper, we propose a novel image inpainting approach, namely cascading blend network (CBNet), to strengthen the capacity of feature representation. As a whole, we introduce an adjacent transfer attention (ATA) module in the decoder, which preserves contour structure reasonably from the deep layer and blends structure-texture information from the shadow layer. In a coarse to delicate manner, a multi-scale contextual blend (MCB) block is further designed to felicitously assemble the multi-stage feature information. In addition, to ensure a high qualified hybrid of the feature information, extra deep supervision is applied to the intermediate features through a cascaded loss. Qualitative and quantitative experiments on the Paris StreetView, CelebA, and Places2 datasets demonstrate the superior performance of our approach compared with most state-of-the-art algorithms.
Along with rising traffic jams, accurate counting of vehicles in surveillance images is becoming increasingly difficult. Current counting methods based on density maps have achieved tremendous improvement due to the prosperity of convolution neural networks. However, as highly overlapping and sophisticated large-scale variation phenomena often appear within dense images, neither traditional CNN methods nor fixed-size self-attention transformer methods can implement exquisite counting. To relieve these issues, in this paper, we propose a novel vehicle counting approach, namely the synergism attention network (SAN), by unifying the benefits of transformers and convolutions to perform dense counting assignments effectively. Specifically, a pyramid framework is designed to adaptively utilize the multi-level features for better fitting in counting tasks. In addition, a synergism transformer (SyT) block is customized, where a dual-transformer structure is equipped to capture global attention and location-aware information. Finally, a Location Attention Cumulation (LAC) module is also presented to explore the more efficient and meaningful weighting regions. Extensive experiments demonstrate that our model is very competitive and reached new state-of-the-art performance on TRANCOS datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.