Classical models of collective behavior often take a "birds-eye perspective," assuming that individuals have access to social information that is not directly available (e.g., the behavior of individuals outside of their field of view). Despite the explanatory success of those models, it is now thought that a better understanding needs to incorporate of the perception of the individual, i.e. how internal and external information are acquired and processed. In particular, vision has appeared to be a central feature to gather external information and influence the collective organization of the group. Here we show that a vision based model of collective behavior is sufficient to generate organized collective behavior in the absence of spatial representation and collision. Our work suggests a novel approach for development of purely vision-based autonomous swarm robotic systems, and formulates a mathematical framework for exploration of perception-based interactions and how they differ from physical ones. Thus, it is of broader relevance for self-organization in complex systems, neuroscience, behavioral sciences and engineering. 1 Models of collective behaviour often rely on phenomenological interactions of individuals with neighbors 1-4 . However, and contrary to physical interaction, these social interactions do not have a direct physical reality, such as gravity or electromagnetism. The behavior of individuals is influenced by their representation of the environment, acquired through sensory information. Current models often suggest that individuals are responding to the state of movement of their neighbors -their (relative) positions and velocities -which are not explicitly encoded in the sensory stream. Thus such phenomenological interactions implicitly assume internal processing of the sensory input in order to extract the relevant state variables. On the other hand, neuroscience has made tremendous progress in understanding various aspects of the relation of sensory signals and movement response, yet connections to large-scale collective behavior are lacking. Although evidence has been found for neural representation of social cues in the case of mice 5 and bats 6 , yet details and role of these internal representations remain unclear, in particular in the context of coordination of movement. Collective behavior crucially depends on the sensory information available to individuals, thus ignoring perception by relying on ad-hoc rules, strongly limits our understanding of the underlying complexity of the problem. Besides, it obstructs the interdisciplinary exchange between biology, neuroscience, engineering, and physics.Recently, the visual projection field has appeared as a central feature of collective movements, in fish 7-10 , birds 11 or humans 12 . Due to the geometrical nature of vision, i.e. the projection of the environment, vision appears as a good starting point to explore the relationship between sensory information and emergent collective behaviors. Some models have attempted to relate vision and mo...