2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01856
|View full text |Cite
|
Sign up to set email alerts
|

BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(15 citation statements)
references
References 27 publications
0
15
0
Order By: Relevance
“…In Ramesh et al [33] the authors demonstrate that applying the Diffusion Prior and conditioning over the resulting image embeddings attains improved diversity while enabling image variations, interpolations, and editing. Several works have adopted the use of a Diffusion Prior for text-guided video synthesis [13,43] and 3D generation and texturing [26,51]. The use of Diffusion Prior for text-guided synthesis is further analyzed in [1,53].…”
Section: Related Workmentioning
confidence: 99%
“…In Ramesh et al [33] the authors demonstrate that applying the Diffusion Prior and conditioning over the resulting image embeddings attains improved diversity while enabling image variations, interpolations, and editing. Several works have adopted the use of a Diffusion Prior for text-guided video synthesis [13,43] and 3D generation and texturing [26,51]. The use of Diffusion Prior for text-guided synthesis is further analyzed in [1,53].…”
Section: Related Workmentioning
confidence: 99%
“…In [1], the authors put forward TextRnet and a new text segmentation dataset, where TextRnet utilizes the unique text prior such as texture diversity and non-convex contours to achieve state-of-the-art performance on text segmentation benchmarks. The authors of [7] mainly focus on Bi-Lingual text segmentation, they propose a Bi-Lingual text dataset along with PGTSNet which contains a plug-in text-highlighting module and a text perceptual module to help distinguish between text languages. Most recently, Textformer [8] leverages the similarities between text components by proposing a multi-level transformer framework to enhance the interaction between text components and image features at different granularities.…”
Section: Text Segmentation Methodsmentioning
confidence: 99%
“…Diffusion model has emerged the most advanced deep generation model and has been applied in a wide range of fields, including image super resolution [RBL*22, SHC*22, DMH21], image inpainting [LDR*22, XZL*23], image editing [MHS*21, KZL*23, YGZ*23, ZHG*23], semantic segmentation [HAZ*22b, BRV*21, BKC*22, GMJS22], video generation [HNM*22, HCS*22, ZCP*22, QCZ*23], natural language processing [AJH*21, GLF*22, HKT22, LTG*22], point cloud completion [LWYL22, LH21,VWG*22, ZDW21] and multi‐modal generation [RLJ*23, TRG*22, PVG*21, SPH*22, ALF22, BNX*23, GCB*22, NDR*21, PJBM22, XWC*23, LGT*23], as well as interdisciplinary applications in fields such as and medical image reconstruction [CSY22, CY22, PGZ*22a, PGZ*22b]. Notably, in the area of high‐resolution image generation, the impact of diffusion models has surpassed that of GANs.…”
Section: Related Workmentioning
confidence: 99%
“…This is because diffusion models are much easier to train and offer better diversity. In addition, it can be observed that diffusion models deliver promising performance on multimodal data [RLJ*23, TRG*22, PVG*21, SPH*22, ALF22, BNX*23, GCB*22, NDR*21, PJBM22, XWC*23, LGT*23] and conditional generations [NSL*23, ZLW*23, BNX*23, GCB*22, NDR*21]. Moreover, several studies have introduced diffusion models into 3D geometry generations such as point cloud [VWG*22], while extending them to directly generate neural implicit representation remains difficult.…”
Section: Introductionmentioning
confidence: 99%