Neuromorphic sensors, a.k.a. dynamic vision sensors (DVS) or silicon retinas, do not capture full images (frames) at a fixed rate, but asynchronously capture spikes indicating changes of brightness in the scene, following the principles of biological vision and perception in mammals. DVS sensing and processing produces a data representation where the scene can be represented with a very high time resolution with a limited number of bits (an inherent data compression is performed at the time of acquisition). Such representation can be used locally to derive actionable responses and selected parts can be transmitted and then processed in another network location. Due to these features, such sensors represent an excellent choice as visual sensing technology for next-generation Internet-of-Things, e.g. in surveillance, drone technology, and robotics. It is in fact becoming evident that in this framework acquiring, processing, and transmitting frame-based video is inefficient in terms of energy consumption and reaction times, in particular in some scenarios. Hence, we explore here the feasibility of advanced Machine to Machine (M2M) communications systems that directly capture, compress and transmit spike-based visual information to cloud computing services in order to produce content classification or retrieval results with extremely low power and low latency.