The objective of this study is to propose a communication assistance system for patients with difficulty moving their extremities and eyes, such as advanced amyotrophic lateral sclerosis (ALS) patients with completely locked-in syndrome (CLIS), based on object-based visual attention utilizing pupil size measurement data. The present research used a hybrid image created by overlapping an image with high spatial frequency (HSF) components and one with low spatial frequency (LSF) components as the stimulus image. Five participants in Experiment 1 and eight in Experiment 2 were instructed to pay attention to the high or low-spatial frequency image. The stimuli in Experiment 1 were constant throughout one trial; however, in Experiment 2, the spatial frequency condition of the given hybrid image was swapped in one trial with different alternation frequencies: 0.16Hz, 0.33Hz, and 0.50Hz. Participants were required to always maintain their attention on the target image. In experiment 1, maximum and mean pupil constriction values were significantly larger when the attentional target was a HSF image than when it was a LSF image. However, when we extrapolated the target based on the difference in pupil measurement data when the subjects focused their attention on images with different spatial frequencies, the accuracy was lower than 70%. In experiment 2, we found that the slope of pupil size change after stimuli swap was significantly larger when the target image was switched from the HSF image to the LSF image than when from LSF to HSF. While estimating the target image based on the above pattern, we found that almost participants show inferred accuracy exceeding 70%, with the highest reaching 95.24%—the lowest alternation frequency (0.16Hz) used in the present experiment results in higher inferred accuracy. This study proposes a low-cost, practice-free, non-invasive visual attention-based communication support system that does not require professional assistance. Since the stimulus image is a hybrid image, patients do not need to divide their attention from eye gaze position; therefore, reducing the complexity of operation.