2019
DOI: 10.1007/978-3-030-21074-8_11
|View full text |Cite
|
Sign up to set email alerts
|

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

Abstract: An end-to-end trainable (fully differentiable) method for multilanguage scene text localization and recognition is proposed. The approach is based on a single fully convolutional network (FCN) with shared layers for both tasks. E2E-MLT is the first published multi-language OCR for scene text. While trained in multi-language setup, E2E-MLT demonstrates competitive performance when compared to other methods trained for English scene text alone. The experiments show that obtaining accurate multi-language multi-sc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
55
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 58 publications
(56 citation statements)
references
References 46 publications
1
55
0
Order By: Relevance
“…In this section, we compare our network with the stateof-the-art approaches [1], [3], [10], [11], [16], [18], [20], [21], [23], [24], [27]- [29], [43], [45], [49], [65], [66], [69], [69], [71]- [73] on six different benchmark datasets. We consider recall, precision, and f-measure as the metrics for evaluation of accuracy of detection.…”
Section: Comparison With State-of-the-art Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…In this section, we compare our network with the stateof-the-art approaches [1], [3], [10], [11], [16], [18], [20], [21], [23], [24], [27]- [29], [43], [45], [49], [65], [66], [69], [69], [71]- [73] on six different benchmark datasets. We consider recall, precision, and f-measure as the metrics for evaluation of accuracy of detection.…”
Section: Comparison With State-of-the-art Resultsmentioning
confidence: 99%
“…Each proposal uses a connectionist temporal classification to decode multi-language texts. E2E-MLT [71] is one of the popular multilingual optical character recognition for scene text detection and recognition. It uses a single shared fully convolutional network.…”
Section: Scene Text Spottingmentioning
confidence: 99%
“…They reported a character recognition rate of 98.17%, 97.44% for the Arabic news video text and 75.05% for the synthetic Arabic natural scene character image datasets. A multi-language end-to-end scene text recognition system was presented in [58]. The authors used the ResNet50 [59] network for text localization, VGG16 [60] pre-trained on ImageNet [61] data for script identification and an OCR module proposed in [62] for multi-language text recognition.…”
Section: Ieee Accessmentioning
confidence: 99%
“…In recent years, with the renaissance of convolutional neural networks (CNNs), many deep learning-based methods [ 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 ] have achieved remarkable achievements in text detection, and these methods can be divided into top-down and bottom-up methods. The top-down methods [ 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 ], also commonly referred to as regression-based methods, usually adopt popular object detection pipelines to first detect text on the block level and then break a block into the word or line level if necessary. However, because of the structural limitations of the corresponding CNN models, these methods cannot efficiently handle long text and arbitrari...…”
Section: Introductionmentioning
confidence: 99%