“…As the thriving of deep learning, the researchers also made attempts to build the text recognition models based on deep neural networks following the bottom-up fashion [2,3,5,6,22,23,29,35,38,[40][41][42][43]46,47,51,59,60,65,67,71,73,83,87,90,98,100]. For example, CRNN [59] utilizes the CNN-RNN architecture to extract features for the text images, which are further supervised with the CTC loss [24] to maximize the probability of the ground truth.…”