2018 IEEE International Conference on Big Data (Big Data) 2018
DOI: 10.1109/bigdata.2018.8622129
|View full text |Cite
|
Sign up to set email alerts
|

A unified scheme of text localization and structured data extraction for joint OCR and data mining

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 30 publications
0
8
0
Order By: Relevance
“…Few printed documents like the receipts and the invoices may contain handwritten remarks or characters or numerals, requiring advanced OCR techniques for the recognition [7]. Datasets are also available in different languages like Chinese passport and medical receipt dataset [24]. Language-ambiguity, poor morphology, language-dependent annotations, and the unavailability of large labeled and annotated corpus in the public dataset could be an issue [115], [116].…”
Section: ) Challenges/issues With Existing Datasetsmentioning
confidence: 99%
See 4 more Smart Citations
“…Few printed documents like the receipts and the invoices may contain handwritten remarks or characters or numerals, requiring advanced OCR techniques for the recognition [7]. Datasets are also available in different languages like Chinese passport and medical receipt dataset [24]. Language-ambiguity, poor morphology, language-dependent annotations, and the unavailability of large labeled and annotated corpus in the public dataset could be an issue [115], [116].…”
Section: ) Challenges/issues With Existing Datasetsmentioning
confidence: 99%
“…We will cover OCR, RPA and NER as the three approaches/techniques. Text extraction is the main stage in automating document image processing [24], [32], [50], [109]. The document images can be compressed or uncompressed, grayscale or color and the text in the images can be editable or non-editable [64], [59], [39].…”
Section: Rq4-ai Approaches Used For Unstructured Document Processingmentioning
confidence: 99%
See 3 more Smart Citations