We present a novel representation of visual information, based on local symbolic descriptors, that we call visual primitives. These primitives: (1) combine different visual modalities, (2) associate semantic to local scene information, and (3) reduce the bandwidth while increasing the predictability of the information exchanged across the system. This representation leads to the concept of early cognitive vision that we define as an intermediate level between dense, signal-based early vision and high-level cognitive vision. The framework's potential is demonstrated in several applications, in particular in the area of robotics and humanoid robotics, which are briefly outlined.