Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems 2018
DOI: 10.1145/3264746.3264793
|View full text |Cite
|
Sign up to set email alerts
|

Automatic knowledge extraction from OCR documents using hierarchical document analysis

Abstract: Industries can improve their business efficiency by analyzing and extracting relevant knowledge from large numbers of documents. Knowledge extraction manually from large volume of documents is labor intensive, unscalable and challenging. Consequently, there have been a number of attempts to develop intelligent systems to automatically extract relevant knowledge from OCR documents. Moreover, the automatic system can improve the capability of search engine by providing application-specific domain knowledge. Howe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 25 publications
0
6
0
Order By: Relevance
“…The maximum values became (359.99, 200.31, 113.71, 359.98, 223.59, 211.74, 189.75, 1399.1). The number of bits to code each bits are (9,8,7,9,8,8,8,11) by taking log2 of the maximum values and round it up. This gives a total of 68 bits for each entry in the dataset.…”
Section: = *mentioning
confidence: 99%
See 1 more Smart Citation
“…The maximum values became (359.99, 200.31, 113.71, 359.98, 223.59, 211.74, 189.75, 1399.1). The number of bits to code each bits are (9,8,7,9,8,8,8,11) by taking log2 of the maximum values and round it up. This gives a total of 68 bits for each entry in the dataset.…”
Section: = *mentioning
confidence: 99%
“…Any OCR system is built on finding suitable features of the character images that can be used to classify the image, and then developing the classification algorithm. There are many surveys that explain several feature extraction methods and classification and recognition algorithms [3][4][5][6][7]. The character document image's physical and logical structure, and how its different parts relate, lead to the proper feature extraction method [ 8].…”
Section: Introductionmentioning
confidence: 99%
“…Different natural language processing models are widely used for knowledge extraction from documents [16]. An efficient framework for knowledge extraction system was proposed that can extract efficient information from thousands of unstructured OCR contract documents [4]. The framework used rule-based methods for converting unstructured OCR documents into hierarchical structed JSON format and then vector space model was utilized for retrieving ranked relevant information from the documents.…”
Section: Related Workmentioning
confidence: 99%
“…Manually relevant knowledge extraction from the large volume of articles is labor-intensive, unscalable and challenging [4,15]. Consequently, there have been several attempts to develop intelligent systems to automatically extract relevant knowledge from many unstructured documents.…”
Section: Introductionmentioning
confidence: 99%
“…Also handcraft patterns can't understand the complex semantic information in the documents. (M. Mohammad and et al 2018) proposed to construct structural formatted data, by extracting document layout features and then analyzing the changes in the layout features. However, according to practical experience, it is hard to depend on the authors of documents to describe a process strictly following the layout rules.…”
Section: Introductionmentioning
confidence: 99%