Scene text recognition as a subsequent task of text detection is to convert the detected text regions into computer readable and editable characters, words or text lines. To improve the recognition of long text, a text recognition method based on the involution operator and graph convolutional network (GCN) is proposed. The method builds a new feature sequence extraction network based on the involution operator, which can extract text features within a larger sense field and generate feature sequences. Furthermore, the addition of gate recurrent unit (GRU) and GCN can help the model to remember useful text features while forgetting useless visual information, and enhance the contextual semantic information of feature sequences from multiple perspectives. Comparative experiments on ICDAR (International Conference on Document Analysis and Recognition) 2013, ICDAR 2015 and SVT (Street View Text) datasets show that the method can effectively recognize multi-scale ambiguous and adherent text. The F-score reaches 94.2%, 89.7%, and 93.8% without lexicon supervision, respectively.