CTI-GAN: Cross-Text-Image Generative Adversarial Network for Bidirectional Cross-modal Generation

Jing, Changhong; Xue, Bing; Pan, Junren

doi:10.1145/3569966.3569990

Cited by 1 publication

(1 citation statement)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These methods leverage largescale unlabeled image and text data for pretraining and learning aligned representations for images and text. Some methods incorporate techniques like generative adversarial networks (GANs) and variational autoencoders (VAEs) to generate richer and continuous cross-modal representations [36,37]. Others introduce image and text reconstruction tasks in self-supervised learning to further enhance the model's learning capability [38,39].…”

Section: Cross-modal Correlation Algorithmmentioning

confidence: 99%

Strong and Weak Supervision Combined with CLIP for Water Surface Garbage Detection

Ma,

Chu,

Liu

et al. 2023

Water

View full text Add to dashboard Cite

Water surface garbage has a significant impact on the protection of water environments and ecological balance, making water surface garbage object detection a critical task. Traditional supervised object detection methods require a large amount of annotated data. To address this issue, we propose a method that combines strong and weak supervision with CLIP (Contrastive Language–Image Pretraining) for water surface garbage object detection. First, we train on a dataset annotated with strong supervision, using traditional object detection algorithms to learn the location information of water surface garbage. Then, we input the water surface garbage images into CLIP’s visual encoder to obtain visual feature representations. Simultaneously, we train CLIP’s text encoder using textual description annotations to obtain textual feature representations of the images. By fusing the visual and textual features, we obtain comprehensive feature representations. In the weak supervision training phase, we input the comprehensive feature representations into the object detection model and employ a training strategy that combines strong and weak supervision to detect and localize water surface garbage. To further improve the model’s performance, we introduce attention mechanisms and data augmentation techniques to enhance the model’s focus and robustness towards water surface garbage. By conducting experiments on two water surface garbage datasets, we validate the effectiveness of the proposed method based on the combination of strong and weak supervision with CLIP for water surface garbage object detection tasks.

show abstract