Verbs of perception are known for their prolific use in various non-literal functions that are usually argued to have developed from their denotational semantics (San Roque, Kendrick, Norcliffe & Majid 2018). In this study we document interactional practices involving the Estonian 2nd person verb form näed ’you see’ to demonstrate that its usage is anchored in face-to-face situations where the speaker guides a co-present other’s visual attention. Through multimodal analysis we show how näed is coordinated with the participants’ body orientations, gestures, and gazes to point to visually available proof for one’s current arguments, rendering it an evidential meaning even in its most “literal” uses of seeing, when a co-participant is invited to consider the visual evidence. We argue that the spatially anchored uses constitute a natural habitat of verbs of seeing, as visual perception is a mutually calibrated interactional accomplishment. Relevant syntactic constructions emerge in real time conversation where näed, calling for a visual orientation, is either preceded or followed by clausal specifications of what is to be seen, which makes it look like a particle. This challenges the argument that perception verbs start out as syntactic predicates in full clauses to then develop other uses.