2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020
DOI: 10.1109/cvprw50498.2020.00279
|View full text |Cite
|
Sign up to set email alerts
|

Textual Visual Semantic Dataset for Text Spotting

Abstract: Text Spotting in the wild consists of detecting and recognizing text appearing in images (e.g. signboards, traffic signals or brands in clothing or objects). This is a challenging problem due to the complexity of the context where texts appear (uneven backgrounds, shading, occlusions, perspective distortions, etc.). Only a few approaches try to exploit the relation between text and its surrounding environment to better recognize text in the scene. In this paper, we propose a visual context dataset 1 for Text S… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 36 publications
0
5
0
Order By: Relevance
“…To obtain the visual context o from each image I, we use out-of-the-box classifiers to extract the image context information o(I). Specifically, following (Sabir et al, 2023), the objects extracted from all pre-trained models are obtained by extracting the top-3 object class/category (excluding person category) from each classifier after filtering out instances with (1) the cosine distance between u B l r t g + q C o Z F g l l q u D e y M t l z K r / k T 0 L 8 k y E m F 5 K h 3 y u / t r u Z p D A q 5 Z N a 2 A j / B M G M G B Z c w K r V T C w n j A 9 a H l q O K x W D D b L L M i G 4 7 p U t 7 2 r i j k E 7 U 7 x M Z i 6 0 d x p F L x g y v 7 W 9 v L P 7 n t V L s H Y e Z U E m K o P j X Q 7 1 U U t R 0 3 A z t C g M c 5 d A R x o 1 w f 6 X 8 m h n G 0 f V X c i U E v 1 f + S 5 p 7 t e C w d n C + X z n Z z e s o k g 2 y S a o k I E f k h J y R O m k Q T u 7 I A 3 k i z 9 6 9 9 + i 9 e K 9 f 0 Y K X z 6 y T H / D e P g G V d p q g < / l a t e x i t > Cosine (gender bias score) u B l r t g + q C o Z F g l l q u D e y M t l z K r / k T 0 L 8 k y E m F 5 K h 3 y u / t r u Z p D A q 5 Z N a 2 A j / B M G M G B Z c w K r V T C w n j A 9 a H l q O K x W D D b L L M i G 4 7 p U t 7 2 r i j k E 7 U 7 x M Z i 6 0 d x p F L x g y v 7 W 9 v L P 7 n t V L s H Y e Z U E m K o P j X Q 7 1 U U t R 0 3 A z t C g M c 5 d A R x o 1 w f 6 X 8 m h n G 0 f V X c i U E v 1 f + S 5 p 7 t e C w d n C + X z n Z z e s o k g 2 y S a o k I E f k h J y R O m k Q T u 7 I A 3 k i z 9 6 9 9 + i 9 e K 9 f 0 Y K X z 6 y T H / D e P g G V d p q g < / l a t e x i t > Cosine (gender bias score) < l a t e x i t s h a 1 _ b a s e 6 4 = " N k d 1 k o M o N B 1 1 w Z E S r m s i o q + I w a U = " > A A A C A H i c b V D J S g N B E O 1 x j X G L e v D g p T E R P E i Y E b d j Q A S P E c w C S Q g 9 P Z W k S U / P 0 F 0 j h i E X f 8 W L B 0 W 8 + h n e / B s 7 y 0 E T H x Q 8 3 q u i q p 4 f S 2 H Q d b + d h c W l 5 Z X V z F p 2 f W N z a z u 3 s 1 s 1 U a I 5 V H g k I 1 3 3 m Q E p F F R Q o I R 6 r I G F v o S a 3 7 8 e + b U H 0 E Z E 6 h 4 H M b R C 1 l W i I z h D K 7 V z + 4 U m w i O m t U g H 9 C b 0 I Q i E 6 g 4 L 7 V z e L b p j 0 H n i T U m e T F F u 5 7 6 a Q c S T E B R y y Y x p e G 6 M r Z R p F F z C M N t M D M S M 9 1 k X G p Y q F o J p p e M H h v T I K g H t R N q W Q j p W f 0 + k L D R m E P q 2 M 2 T Y M 7 P e S P z P a y T Y u W q l Q s U J g u K T R Z 1 E U o z o K A 0 a C A 0 c 5 c A S x r W w t 1 L e Y 5 p x t J l l b Q j e 7 M v z p H p a 9 C 6 K 5 3 d n + d L J N I 4 M O S C H 5 J h 4 5 J K U y C 0 p k w r h Z E i e y S t 5 c 5 6 c F + f d + Z i 0 L j j T m T 3 y B 8 7 n D z 9 o l i E = < / l a t e x i t >…”
Section: Visual Context Informationmentioning
confidence: 99%
“…To obtain the visual context o from each image I, we use out-of-the-box classifiers to extract the image context information o(I). Specifically, following (Sabir et al, 2023), the objects extracted from all pre-trained models are obtained by extracting the top-3 object class/category (excluding person category) from each classifier after filtering out instances with (1) the cosine distance between u B l r t g + q C o Z F g l l q u D e y M t l z K r / k T 0 L 8 k y E m F 5 K h 3 y u / t r u Z p D A q 5 Z N a 2 A j / B M G M G B Z c w K r V T C w n j A 9 a H l q O K x W D D b L L M i G 4 7 p U t 7 2 r i j k E 7 U 7 x M Z i 6 0 d x p F L x g y v 7 W 9 v L P 7 n t V L s H Y e Z U E m K o P j X Q 7 1 U U t R 0 3 A z t C g M c 5 d A R x o 1 w f 6 X 8 m h n G 0 f V X c i U E v 1 f + S 5 p 7 t e C w d n C + X z n Z z e s o k g 2 y S a o k I E f k h J y R O m k Q T u 7 I A 3 k i z 9 6 9 9 + i 9 e K 9 f 0 Y K X z 6 y T H / D e P g G V d p q g < / l a t e x i t > Cosine (gender bias score) u B l r t g + q C o Z F g l l q u D e y M t l z K r / k T 0 L 8 k y E m F 5 K h 3 y u / t r u Z p D A q 5 Z N a 2 A j / B M G M G B Z c w K r V T C w n j A 9 a H l q O K x W D D b L L M i G 4 7 p U t 7 2 r i j k E 7 U 7 x M Z i 6 0 d x p F L x g y v 7 W 9 v L P 7 n t V L s H Y e Z U E m K o P j X Q 7 1 U U t R 0 3 A z t C g M c 5 d A R x o 1 w f 6 X 8 m h n G 0 f V X c i U E v 1 f + S 5 p 7 t e C w d n C + X z n Z z e s o k g 2 y S a o k I E f k h J y R O m k Q T u 7 I A 3 k i z 9 6 9 9 + i 9 e K 9 f 0 Y K X z 6 y T H / D e P g G V d p q g < / l a t e x i t > Cosine (gender bias score) < l a t e x i t s h a 1 _ b a s e 6 4 = " N k d 1 k o M o N B 1 1 w Z E S r m s i o q + I w a U = " > A A A C A H i c b V D J S g N B E O 1 x j X G L e v D g p T E R P E i Y E b d j Q A S P E c w C S Q g 9 P Z W k S U / P 0 F 0 j h i E X f 8 W L B 0 W 8 + h n e / B s 7 y 0 E T H x Q 8 3 q u i q p 4 f S 2 H Q d b + d h c W l 5 Z X V z F p 2 f W N z a z u 3 s 1 s 1 U a I 5 V H g k I 1 3 3 m Q E p F F R Q o I R 6 r I G F v o S a 3 7 8 e + b U H 0 E Z E 6 h 4 H M b R C 1 l W i I z h D K 7 V z + 4 U m w i O m t U g H 9 C b 0 I Q i E 6 g 4 L 7 V z e L b p j 0 H n i T U m e T F F u 5 7 6 a Q c S T E B R y y Y x p e G 6 M r Z R p F F z C M N t M D M S M 9 1 k X G p Y q F o J p p e M H h v T I K g H t R N q W Q j p W f 0 + k L D R m E P q 2 M 2 T Y M 7 P e S P z P a y T Y u W q l Q s U J g u K T R Z 1 E U o z o K A 0 a C A 0 c 5 c A S x r W w t 1 L e Y 5 p x t J l l b Q j e 7 M v z p H p a 9 C 6 K 5 3 d n + d L J N I 4 M O S C H 5 J h 4 5 J K U y C 0 p k w r h Z E i e y S t 5 c 5 6 c F + f d + Z i 0 L j j T m T 3 y B 8 7 n D z 9 o l i E = < / l a t e x i t >…”
Section: Visual Context Informationmentioning
confidence: 99%
“…In this study, we reviewed 17 datasets, containing both regular and irregular types of text. We focused on the most commonly‐used datasets for method evaluation, excluding recent publications that have not been included in most performance analysis methods [47, 48] and datasets that focus on artistic images, as their use is sporadic in current state‐of‐the‐art approaches [23]. Three of these datasets focus solely on the detection task, while four focus on the recognition problem.…”
Section: Datasets and Performancementioning
confidence: 99%
“…Visual context information has further been used by Sabir et al (2020) to train/tune and evaluate existing semantic similarity‐based text spotting baselines for re‐ranking the produced text hypothesis resulting in improvement in the accuracy of the text spotting. A visual context dataset has been introduced for text spotting in the wild by including information, such as a textual image description (caption), the names of objects and their places in images, about the scene images of the publicly available dataset COCO‐text.…”
Section: Spotting ‐Based Mining Approachesmentioning
confidence: 99%
“…A visual context dataset has been introduced for text spotting in the wild by including information, such as a textual image description (caption), the names of objects and their places in images, about the scene images of the publicly available dataset COCO‐text. This enables researchers to use semantic relations between texts and scenes in their text spotting systems (Sabir et al, 2020).…”
Section: Spotting ‐Based Mining Approachesmentioning
confidence: 99%