This paper proposes a simple and efficient method of joint scene text recognition for both horizontal and vertical writing. Recently, end-to-end scene text recognition using the Transformer-based autoregressive encoder-decoder model offers high recognition accuracy. Research into this method has mainly focused on horizontally written text, but in several Asian countries, texts are also written vertically. To efficiently train a recognition model for jointly recognizing horizontal and vertical writing, several methods have been proposed that partially share model components between each writing direction. However, this approach lowers training efficiency because non-shareable components are trained only on just horizontal or vertical writing data. To increase training efficiency, our key idea is to consider writing direction in the continuous space obtained by a fully shareable model for horizontal and vertical writing. To this end, our proposed method gives the writing direction as an initial token to the autoregressive decoder while sharing all components for each writing direction. Furthermore, to incorporate common features between each writing into the model, the proposed method predicts character count before predicting the character string. Experiments on Japanese scene text recognition demonstrate the effectiveness of the proposed method.