Image aesthetic assessment (IAA) is a challenging task in computer vision fields, which aims to automatically evaluate image beauty by simulating human perception on image aesthetic. With the development of deep learning, although convolutional neural network (CNN)-based IAA approaches have achieved extraordinary progress, CNN experiences difficulty to capture long-distance relationships among visual elements. There is a strong correlation between image layout and image semantic information for image aesthetic. In order to solve this problem, an another scale-guided parallel transformer is proposed, including a multiscale local feature extractor (ME), a feature projection (FP), and an another scale-guided parallel feature fusion transformer (AST). The ME captures primary local features with classic ResNet at multiple scales. The FP performs dimension transformation on feature maps for each scale, which can obtain feature token and aesthetic token. The AST with two parallel transformer encoders is exploited to highlight the significant regions in the holistic image, in which the feature tokens and the aesthetic token from another scale are grouped together to obtain interscale guidance. The final score distribution is achieved by weighting multiple aesthetic tokens with learnable parameters for unified aesthetics assessment. Extensive experiments on two public datasets, including aesthetic visual analysis and aesthetics and attributes database, demonstrate that the proposed method outperforms the state-of-the-art methods across three different tasks.