2014 11th IAPR International Workshop on Document Analysis Systems 2014
DOI: 10.1109/das.2014.75
|View full text |Cite
|
Sign up to set email alerts
|

Context-Dependent Confusions Rules for Building Error Model Using Weighted Finite State Transducers for OCR Post-Processing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0
1

Year Published

2015
2015
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 9 publications
0
1
0
1
Order By: Relevance
“…The classical technique, widely used in different fields, to find the maximum likelihood path on a finite-state machine and to perform error-correcting parsing on a regular grammar is the Viterbi Algorithm [10,11]. In [12], OCR errors are corrected using contextdependent confusion rules and a language model, both represented by means of Finite-State Transducers (WFST). The confusion rules are extracted from a training corpus by aligning the misrecognized words of the OCR output with their corresponding ground truth.…”
Section: Introductionmentioning
confidence: 99%
“…The classical technique, widely used in different fields, to find the maximum likelihood path on a finite-state machine and to perform error-correcting parsing on a regular grammar is the Viterbi Algorithm [10,11]. In [12], OCR errors are corrected using contextdependent confusion rules and a language model, both represented by means of Finite-State Transducers (WFST). The confusion rules are extracted from a training corpus by aligning the misrecognized words of the OCR output with their corresponding ground truth.…”
Section: Introductionmentioning
confidence: 99%
“…Hassan Awadallah et al [2008] hace uso de autómatas de estados finitos (FSA) a la hora de proponer correcciones candidatas con una distancia de edición específica con las correcciones con errores ortográficos. Al Azawi and Breuel [2014] hace uso de autómatas transductores de estados finitos (WFST) para modelizar la información de confusiones contextuales de símbolos obtenidas a partir del algoritmo Levenshtein [1966] y fusionar dicha información con la información procedente de un OCR y el modelo de lenguaje de salida ambos modelizados también mediante autómatas WFST.…”
Section: Estado Del Arteunclassified