2017
DOI: 10.1134/s1054661817030142
|View full text |Cite
|
Sign up to set email alerts
|

Algorithm for segmenting script-dependant portion in a bilingual Optical Character Recognition system

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 3 publications
0
3
0
Order By: Relevance
“…Now the segments of the test page which are Telugu characters are given to the trained model for prediction. Later SoftMax is applied to classify images, which uses the expression to generate discrete probabilities of all classes between 0 and 1 given by (8),…”
Section: Character Segmentation and Feature Extractionmentioning
confidence: 99%
See 1 more Smart Citation
“…Now the segments of the test page which are Telugu characters are given to the trained model for prediction. Later SoftMax is applied to classify images, which uses the expression to generate discrete probabilities of all classes between 0 and 1 given by (8),…”
Section: Character Segmentation and Feature Extractionmentioning
confidence: 99%
“…Third phase is character segmentation, a crucial step in OCR [7], [8]. This phase depends on language.…”
Section: Introductionmentioning
confidence: 99%
“…( 2011)presentedarobustandfastword-wiseidentificationmethodbyisolatingpages, blocks,andparagraphsfromreal-worldbi-lingualdatasets.Theirmethodperformedpreprocessing first,thenfoundtexturefeaturesatpageandblocklevelsandstructuralfeaturesatwordlevel,then usedprofilebasedsegmentationforblocksandwords,andfinally,performedSVMclassification atpageandblocklevelsandRejectionbasedclassificationusingAdaBoostatthewordlevel.The scriptidentificationmethodproposedbyRani,Dhir,andLehal(2012)recognizedbi-scriptwords through preprocessing, feature extraction of structural, Gabor and Discrete Cosine Transforms (DCT), and finally, classification by SVM, K Nearest Neighbor (KNN) and Probabilistic Neural Network(PNN).Nextword-levelidentificationtechniqueclassifiedbi-scriptdocumentsbyusing worddirectionalenergydistributionfeaturesofGaborfiltersalongwithsuitablefrequenciesand orientations (Chaudhari & Gulati, 2016). Bebartta and Mohanty (2017) 2018)reviewedwordrecognitiontechniquesforIndicandnon-Indicscripts bydiscussingtheirexperimentalresults,databases,recognitionaccuracies,potentialbenefits,and future recommendations for Indic scripts, such as Bengali, Devanagari, Gujarati, Gurumukhi, Kannada,Maithili,Malayalam,Oriya,Tamil,andTelugu,andnon-Indicscripts,suchasArabic, Chinese,Dutch,Japanese,Latin/Roman,Mongolian,Persian,Thai,andUyghur.Theydiscussedword recognitiontypes,approaches,needs,advantages,disadvantages,issues,andchallengesalongwith surveyprotocols,theirdevelopment,conduct,resultanalysis,reporting,andfindings.Theyfound theneedofadvancedprinted/handwrittendocumentrecognitiontechniquesandwordsegmentation algorithmstoachievehighwordrecognitionaccuracy.GhoshandValveny(2018)proposedafast, segmentation-free,wordspottingmethod,andfollowedthestepsofatomicboundingboxgeneration tocreatetextboxandfilterproposals,PyramidalHistogramofCharacters(PHOC)featureencoding toevaluatetheseproposals,performedindexingbyusingPyramidalHistogramofCharacterN-grams (PHON),andfinally,attributemodellearningbyLSVM.Theirmethodshowedtheperformance forquery-by-stringandquery-by-examplewithstandardsingleandmulti-writerdatasets.Table2 providesthecomparisonsamongwordrecognitionandspottingtechniquesfortheyearrange2014 to2018,whichdiscriminatesthesetechniqueswiththeindicatorsofprintedandhandwritten,script andlanguage,datasetandsize,classifierused,theaccuracyachieved,andconstraints,errors,and futuredirections.…”
Section: Script Identification and Recognitionmentioning
confidence: 99%