Recently, the visual quality evaluation of screen content images (SCIs) has become an important and timely emerging research theme. This paper presents an effective and novel blind quality evaluation metric for SCIs by using stacked auto-encoders (SAE) based on pictorial and textual regions. Since the SCI consists of not only the pictorial area but also the textual area, the human visual system (HVS) is not equally sensitive to their different distortion types. Firstly, the textual and pictorial regions can be obtained by dividing an input SCI via a SCI segmentation metric. Next, we extract quality-aware features from the textual region and pictorial region, respectively. Then, two different SAEs are trained via an unsupervised approach for quality-aware features which are extracted from these two regions. After the training procedure of the SAEs, the quality-aware features can evolve into more discriminative and meaningful features. Subsequently, the evolved features and their corresponding subjective scores are input into two regressors for training. Each regressor can obtain one output predictive score. Finally, the final perceptual quality score of a test SCI is computed by these two predicted scores via a weighted model. Experimental results on two public SCI-oriented databases have revealed that the proposed scheme can compare favorably with the existing blind image quality assessment metrics.Index Terms-Screen content image (SCI), human visual system (HVS), stacked auto-encoders (SAE), quality-aware features, unsupervised approach.