2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) 2018
DOI: 10.1109/icfhr-2018.2018.00052
|View full text |Cite
|
Sign up to set email alerts
|

Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm

Abstract: Recurrent Neural Networks (RNNs) are used for sequence recognition tasks such as Handwritten Text Recognition (HTR) or speech recognition. If trained with the Connectionist Temporal Classification (CTC) loss function, the output of such a RNN is a matrix containing character probabilities for each time-step. A CTC decoding algorithm maps these character probabilities to the final text. Token passing is such an algorithm and is able to constrain the recognized text to a sequence of dictionary words. However, th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
78
0
4

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 77 publications
(82 citation statements)
references
References 10 publications
0
78
0
4
Order By: Relevance
“…Finally, the segmented words are feeded into the model. Word Beam Search method is used [13]. From the result, we have achieve 62.85 % accuracy in recognizing the characters.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, the segmented words are feeded into the model. Word Beam Search method is used [13]. From the result, we have achieve 62.85 % accuracy in recognizing the characters.…”
Section: Discussionmentioning
confidence: 99%
“…The word beam search method [13] is used to mapped the input sequence to the output sequence. The resulting ouput sequence is the required output image after decoding.…”
Section: E Decodingmentioning
confidence: 99%
“…The WBS decoder is placed just following the CTC layers for output decoding. The main advantages of the WBS decoder [44] over token passing decoder are:…”
Section: E Word Beam Search (Wbs) Decodermentioning
confidence: 99%
“…We referred to the independent labelling of each time step, or frame. Figure 2 depicts the best path decoding example [26,27] for a 1 s audio file.…”
Section: Phoneme Recognition and Time Alignmentmentioning
confidence: 99%
“…We referred to the independent labelling of each time step, or frame. Figure 2 depicts the best path decoding example [26,27] for a 1 s audio file. ch + sh 0.00 … 0.00 0.00 0.00 0.00 … 0.09 0.00 0.00 0.00 … 0.00 d + d' 0.00 … 0.21 0.10 0.00 0.00 … 0.00 0.00 0.00 0.00 … 0.00 g + g' 0.00 … 0.65 0.88 1.00 1.00 … 0.00 0.00 0.00 0.00 … 0.00 k + k' 0.00 … 0.00 0.00 0.00 0.00 … 0.00 0.00 0.00 0.00 … 0.00 pause 1.00 … 0.00 0.00 0.00 0.00 … 0.13 0.01 0.01 0.00 … 1.00 s + s' 0.00 … 0.00 0.00 0.00 0.00 … 0.78 0.99 0.99 1.00 … 0.00 t + t' 0.00 … 0.00 0.00 0.00 0.00 … 0.00 0.00 0.00 0.00 … 0.00 vow 0.00 … 0.14 0.02 0.00 0.00 … 0.00 0.00 0.00 0.00 … 0.00 As a result, we obtained a sequence of labels.…”
Section: Phoneme Recognition and Time Alignmentunclassified