e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce

Shin, Wonyoung; Park, Jonghun; Woo, Taekang; Cho, Yongwoo; Oh, Kwangjin; Song, Hwanjun

doi:10.1145/3511808.3557067

Cited by 9 publications

(1 citation statement)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, with the emergence of datasets that support various modalities, studies using various modality information have emerged. Shin et al [ 25 ] proposed e-CLIP, which can be deployed on multiple e-commerce downstream tasks, based on an approach [ 26 ] that utilizes both visual and language information. Dong et al [ 13 ] proposed the Self-harmonized Contrastive Learning (SCALE) framework, which unifies the several modalities into a unified model through an adaptive mechanism for fusing features.…”

Section: Related Workmentioning

confidence: 99%

VERD: Emergence of Product-Based Video E-Commerce Retrieval Dataset from User’s Perspective

Lee

Choi

2023

Sensors

View full text Add to dashboard Cite

Customer demands for product search are growing as a result of the recent growth of the e-commerce market. According to this trend, studies on object-centric retrieval using product images have emerged, but it is difficult to respond to complex user-environment scenarios and a search requires a vast amount of data. In this paper, we propose the Video E-commerce Retrieval Dataset (VERD), which utilizes user-perspective videos. In addition, a benchmark and additional experiments are presented to demonstrate the need for independent research on product-centered video-based retrieval. VERD is publicly accessible for academic research and can be downloaded by contacting the author by email.

show abstract

Section: Related Workmentioning

confidence: 99%