“…Type of Content Availability Size of Dataset ACTIV2 [12] Embedded words Public 10,415 text images QTID [13] Synthetic words Private 309,720 words and 249,428 characters IFN/ENIT [14] Handwritten words Public 115,000 words and 212,000 characters AHDB [15] Handwritten words and digits Private 30,000 words APTI [16] Printed words Public 113,284 words and 648,280 characters HACDB [17] Handwritten characters Public 6600 characters and 50 writers UPTI [18] Printed text lines Public 10,000 text lines Digital Jawi [19] Jawi paleography images Public 168 words and 1524 characters KHATT [20] Handwritten text lines Public 9327 lines, 165,890 words and 589,924 characters ALIF [21] Embedded text lines Upon request 1804 words and 89,819 characters ACTIV [22] Embedded text lines Public 4824 lines and 21,520 words SmartATID [23] Printed and handwritten pages Public 9088 pages Degraded historical [24] Handwritten documents Public 10 handwritten images and 10 printed images Printed PAW [25] Printed subwords Upon request 415,280 unique words and 550,000 sub words Checks [26] Handwritten subwords and digits Private 29,498 subwords and 15,148 digits Numeral [27] Handwritten digits Public 21,120 digits and 44 writers Forms [28] Handwritten characters Private 15,800 characters and 500 writers KAFD [29] Printed pages and lines Public 28,767 pages and 644,006 lines AHDBIFTR [30] Handwritten images Public 497 word images and 5 writers ARABASE [31] Handwritten text Public 47,000 words and 500 free Arabic sentences CEDAR [32] Handwritten pages Private 20,000 words, 10 writers, and 100 documents CENPARMI [26] Handwritten subwords and digits Public 6000 digit images Shafi and Zia [33] surveyed automatic Urdu text recognition techniques and described the algorithms, techniques, datasets, challenges, and future directions for Urdu OCR. Additionally, [34] reviewed the availability of datasets and suggested more training data to address the unique challenges of OCR systems.…”