In retail environments, understanding how shoppers move about in a store’s spaces and interact with products is very valuable. While the retail environment has several favourable characteristics that support computer vision, such as reasonable lighting, the large number and diversity of products sold, as well as the potential ambiguity of shoppers’ movements, mean that accurately measuring shopper behaviour is still challenging. Over the past years, machine-learning and feature-based tools for people counting as well as interactions analytic and re-identification were developed with the aim of learning shopper skills based on occlusion-free RGB-D cameras in a top-view configuration. However, after moving into the era of multimedia big data, machine-learning approaches evolved into deep learning approaches, which are a more powerful and efficient way of dealing with the complexities of human behaviour. In this paper, a novel VRAI deep learning application that uses three convolutional neural networks to count the number of people passing or stopping in the camera area, perform top-view re-identification and measure shopper–shelf interactions from a single RGB-D video flow with near real-time performances has been introduced. The framework is evaluated on the following three new datasets that are publicly available: TVHeads for people counting, HaDa for shopper–shelf interactions and TVPR2 for people re-identification. The experimental results show that the proposed methods significantly outperform all competitive state-of-the-art methods (accuracy of 99.5% on people counting, 92.6% on interaction classification and 74.5% on re-id), bringing to different and significative insights for implicit and extensive shopper behaviour analysis for marketing applications.