2022
DOI: 10.11591/ijece.v12i1.pp1018-1029
|View full text |Cite
|
Sign up to set email alerts
|

Automated hierarchical classification of scanned documents using convolutional neural network and regular expression

Abstract: <p>This research proposed automated hierarchical classification of scanned documents with characteristics content that have unstructured text and special patterns (specific and short strings) using convolutional neural network (CNN) and regular expression method (REM). The research data using digital correspondence documents with format PDF images from pusat data teknologi dan informasi (technology and information data center). The document hierarchy covers type of letter, type of manuscript letter, orig… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 25 publications
0
12
0
Order By: Relevance
“…The pre-processing stage involves the development of the automatic hierarchical classification model of digital correspondence documents based on the text classification approach in line with [29] that have been successfully classified digital correspondence documents into 4 manuscripts of letter, 5 type of letter, 15 origins of letter, and 25 subjects of letter with 94% accuracy by combining CNN and regular expression methods. The automatic hierarchical classification model developed in this research covers all the criteria according to the hierarchy of digital correspondence documents which include 5 manuscripts of letters, 22 types of letters, 15 origins of letters, and 25 subjects of letters.…”
Section: Pre-processing Stagementioning
confidence: 99%
See 4 more Smart Citations
“…The pre-processing stage involves the development of the automatic hierarchical classification model of digital correspondence documents based on the text classification approach in line with [29] that have been successfully classified digital correspondence documents into 4 manuscripts of letter, 5 type of letter, 15 origins of letter, and 25 subjects of letter with 94% accuracy by combining CNN and regular expression methods. The automatic hierarchical classification model developed in this research covers all the criteria according to the hierarchy of digital correspondence documents which include 5 manuscripts of letters, 22 types of letters, 15 origins of letters, and 25 subjects of letters.…”
Section: Pre-processing Stagementioning
confidence: 99%
“…The automatic hierarchical classification model developed in this research covers all the criteria according to the hierarchy of digital correspondence documents which include 5 manuscripts of letters, 22 types of letters, 15 origins of letters, and 25 subjects of letters. The manuscript, origin, and subject were classified using the same regular expression pattern applied in the previous research [29] while the types of letters were based on the CNN. Moreover, this model was further applied in the archiving stage.…”
Section: Pre-processing Stagementioning
confidence: 99%
See 3 more Smart Citations