A reverse vending machine motivates citizens to bring recyclable waste by rewarding them, which is a viable solution to increase the recycling rate. Reverse vending machines generally use near-infrared sensors, barcode sensors, or cameras to classify recycling resources. However, sensor-based reverse vending machines suffer from a high configuration cost and the limited scope of target objects, and conventional single image-based reverse vending machines usually make erroneous predictions about intentional fraud objects. This paper proposes a dual image-based convolutional neural network ensemble model to address these problems. For this purpose, we first created a prototype reverse vending machine and constructed an image dataset containing two cross-sections of objects, top and front view. Then, we chose convolutional neural network models widely used in image classification as the candidates for building an accurate and lightweight ensemble model. Considering the size and classification performance of candidates, we constructed the best-fit ensemble combination and evaluated its classification performance. The final ensemble model showed a classification accuracy higher than 95% for all target classes, including fraud objects. This result proves that our approach achieves better robustness against intentional fraud objects than single image-based models and thus can broaden the scope for target resources. The measurement results on lightweight embedded platforms also demonstrated that our model provides a short inference time that is enough to facilitate the real-time execution of reverse vending machines based on low-cost edge artificial intelligence devices.