Identification of packaged products in retail environments still relies on barcodes, requiring active user input and limited to one product at a time. Computer vision (CV) has already enabled many applications, but has so far been under-discussed in the retail domain, albeit allowing for faster, hands-free, more natural humanobject interaction (e.g. via mixed reality headsets). To assess the potential of current convolutional neural network (CNN) architectures to reliably identify packaged products within a retail environment, we created and open-source a dataset of 300 images of vending machines with 15k labeled instances of 90 products. We assessed observed accuracies from transfer learning for imagebased product classification (IC) and multi-product object detection (OD) on multiple CNN architectures, and the number of images instances required per product to achieve meaningful predictions. Results show that as little as six images are enough for 90% IC accuracy, but around 30 images are needed for 95% IC accuracy. For simultaneous OD, 42 instances per product are necessary and far more than 100 instances to produce robust results. Thus, this study demonstrates that even in realistic, fast-paced retail environments, image-based product identification provides an alternative to barcodes, especially for use-cases that do not require perfect 100% accuracy.