“…In the past few years, many methods attempt to join text detection and recognition by proposing a new Region-of-Interest (RoI) operation to achieve the synergy between text detection and text recognition [32,12,50,33,52], as shown in Figure 1(a). They follow the classical two-stage pipeline, which first locates the text instance and then extracts the text content in the corresponding region of interest (RoI).…”