The development of deep neural networks that has been carried out in recent years allows solving highly complex computer vision classification problems. Often, although the results obtained with these classifiers are high, there are certain sectors that seek greater accuracy from these systems. Increasing the accuracy of neural networks can be achieved through ensemble learning, which combines different classifiers with the aim of selecting a winner based on different criteria about them. These techniques have traditionally shown good results although they involve training models of different nature and can even produce an overfitting with respect to the training data, so datasets must be chosen to correctly evaluate the result. In this paper, a Cross-Validation-Voting (CVV) technique for grocery product classification is presented. This technique improves several single state-of-the-art classifiers without combining different ones and avoids the problems of overfitting with respect to the training set. The single classifiers are trained multiple times against distributed sets to show how the results obtained to date from the classification of a well-known dataset are improved. In this dataset, an extensive test set was previously selected by the authors to show comparable results with other papers in the literature. The technique is valid not only for vision nets and can be used to solve numerous problems with different kinds of neural networks and classifiers.
Over the last few years, several techniques have been developed with the aim of implementing one-shot learning, a concept that allows classifying images with only a single image per training category. Conceptually, these methods seek to reproduce certain behavior that humans have. People are able to recognize a person they have only seen once, but they are probably not able to do the same with certain animals, such as a monkey. This is because our brains have been trained for years with images of people but not so much of animals. Among the one-shot learning techniques, some of them have used data generation, such as Generative Adversarial Networks (GAN). Other techniques have been based on the matching of descriptors traditionally used for object detection. Finally, one of the most prominent techniques involves using Siamese neural networks. Siamese networks are usually implemented with two convolutional nets that share their weights. They receive two images as input and can detect whether they belong to the same category or not. In the field of grocery products, there has been a lot of research on the one-shot learning problem but not so much on the use of Siamese networks. In this paper, several classifiers are firstly evaluated to decide on a convolutional model to be used with the Siamese and to improve the baseline results obtained in the dataset used. Then, two existing techniques are integrated within the Siamese model: a convolutional net and a Local Maximal Occurrence (LOMO) descriptor. The latter was initially used for the re-identification of people although it has shown its effectiveness to improve the values of a traditional Siamese with only convolutional sisters. The whole network is trained on categories and responds to different categories, showing its strong capacity to deal with the problem of having only one image per category.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.