“…However, the power of purely bottom-up saliency-based models to predict naturalistic viewing is limited, with other work suggesting that endogenous features like task instructions [e.g., "estimate the ages of the people in the painting", 2, see also, 3], prior knowledge [e.g., an octopus does not belong in a barnyard scene 4], and viewing biases [e.g., the tendency to view faces and text, 5,see also, 6] can also be used to predict gaze allocation and to improve the performance of saliency-based models [6-8,for review, see 9]. The combined influence of these cognitive factors on viewing can be summed into "meaning maps", an analogue to saliency maps generated by crowd sourcing ratings of "meaningfulness" (informativeness + recognizability) for each region of a scene [10]. When compared directly, meaning maps significantly outperform saliency maps in predicting eye movements during naturalistic scene viewing, suggesting that visual saliency alone is insufficient to model human gaze behavior.…”