Compared with traditional 2D images, omnidirectional images (also referred to as 360° images) have more complicated perceptual characteristics due to the particularities of imaging and display. How humans perceive omnidirectional images in an immersive environment and form the immersive quality of experience are important problems. Thus, it is crucial to measure the quality of omnidirectional images under different viewing conditions, which suffer from realistic distortions. In this paper, we build a large-scale subjective assessment database for omnidirectional images and carry out a comprehensive psychophysical experiment to study the relationships between different factors (viewing conditions and viewing behaviors) and the perceptual quality of omnidirectional images. In addition, we collect both subjective ratings and head movement data. A thorough analysis of the collected subjective data is also provided, where we make several interesting findings. Moreover, with the proposed database, we propose a novel transformer-based omnidirectional image quality assessment model. To be consistent with the human viewing process, viewing conditions and behaviors are naturally incorporated into the proposed model. Specifically, the proposed model mainly consists of three parts: viewport sequence generation, multi-scale feature extraction, and perceptual quality prediction. Extensive experimental results conducted on the proposed database demonstrate the effectiveness of the proposed method over existing image quality assessment methods.