With the rapid development of virtual reality (VR) technology, a large number of omnidirectional images (OIs) with uncertain quality are flooding into the internet. As a result, Blind Omnidirectional Image Quality Assessment (BOIQA) has become increasingly urgent. The existing solutions mainly focus on manually or automatically extracting high-level features from OIs, which overlook the important guiding role of human visual perception in this immersive experience. To address this issue, a dual-level network based on human visual perception is developed in this paper for BOIQA. Firstly, a human attention branch is proposed, in which the transformer-based model can efficiently represent attentional features of the human eye within a multidistance perception image pyramid of viewport. Then, inspired by the hierarchical perception of human visual system, a multiscale perception branch is designed, in which hierarchical features of six orientational viewports are considered and obtained by a residual network in parallel. Additionally, the correlation features among viewports are investigated to assist the multiviewport feature fusion, in which the feature maps extracted from different viewports are further measured for their similarity and correlation by the attention-based module. Finally, the output values from both branches are regressed by fully connected layer to derive the final predicted quality score. Comprehensive experiments on two public datasets demonstrate the significant superiority of the proposed method.