Head poses are a key component of human bodily communication and thus a decisive element of human-computer interaction. Real-time head pose estimation is crucial in the context of human-robot interaction or driver assistance systems. The most promising approaches for head pose estimation are based on Convolutional Neural Networks (CNNs). However, CNN models are often too complex to achieve realtime performance. To face this challenge, we explore a popular subgroup of CNNs, the Residual Networks (ResNets) and modify them in order to reduce their number of parameters. The ResNets are modified for different image sizes including low-resolution images and combined with a varying number of layers. They are trained on in-the-wild datasets to ensure real-world applicability. As a result, we demonstrate that the performance of the ResNets can be maintained while reducing the number of parameters. The modified ResNets achieve state-of-the-art accuracy and provide fast inference for real-time applicability.
Recent advances in machine olfaction have demonstrated deep learning algorithms' capabilities in mining patterns in chemosensor data [1, 2]. While these algorithms can perform effective automated feature extraction, they are invariably dependent on a large amount of data. However, there is a pressing need to develop olfactory systems that learn rapidly and adapt continuously in time-critical applications such as gas leak monitoring or fire detection. The primary objectives of this work include the demonstration of rapid-learning and generalization capabilities on chemical sensor data. A simple application is considered in this direction where a system's readiness for rapid classification of gas mixtures is tested. The system consists of a low-cost metal-oxide sensor array that responds to gas mixtures from a headspace. The application in focus is the binary classification of gas sensor data (beverage vs. air). The primary choice of model for algorithm development is a convolutional neural network due to its promising inferencing capabilities. Owing to these algorithms' data-hungry nature, a partial meta-learning approach, known as transfer learning, is adopted. A baseline convolutional neural network is trained on the sensor data to distinguish beverages from the air. This baseline model is fine-tuned on new beverages, referred to as novel classes, using a one-shot and a five-shot regime. Results show that the fine-tuned models successfully distinguish new beverages from a minimal amount of data, besides overcoming the challenges posed by the chemical sensors, such as short-term and long-term drifts in the measurements. The resulting models perform favourably with an average test accuracy of 0.9165 for one-shot learning and 0.9170 for five-shot learning, given that the average baseline model test accuracy is 0.9356. Despite fine-tuning on novel classes, the model preserves its generalizability and is immune to catastrophic forgetting, a shortcoming often faced due to iterative training of neural networks. Method The aroma of fruit juices results from a complex combination of several volatile organic compounds such as esters, aldehydes, alcohols, ketones, and hydrocarbons [3] and depends on ambient factors such as humidity, temperature, etc. While the composition of each fruit juice varies depending on the fruits used, a fruity odour characterizes any fruit beverage. Therefore, an overarching motivation for this study is to detect the presence of fruity odour with chemical sensors. 50 ml samples of commercial juices, namely, orange, apple, blackcurrant, and multivitamin, are used for data collection. The sensor setup consisted of eight AS-MLV metal-oxide (MOX) sensors, humidity, and temperature sensors. The MOX sensors are heated by supplying pulse-modulated voltages for 12 seconds. Voltages are recorded continuously from four MOX sensors placed in the juice headspace. The data collection is carried out over multiple days. Classification of time-series voltages is performed by a convolutional neural network composed of a feature extractor and a classifier (refer Fig. 1). The transfer learning approach, inspired by Sun et al. [4], is split into meta-training and meta-testing stages. In the meta-training stage, both the feature extractor and the classifier are trained with data corresponding to Air and Beverage (a selection of three juices) classes, resulting in a baseline model. A novel juice's data is considered in the meta-testing phase. Classifier weights are fine-tuned with a small support set, comprising one or five samples carrying maximum information. Four experiments are performed holding each juice in the meta-testing dataset. The models are tested for catastrophic forgetting effects after fine-tuning. Furthermore, zero-shot testing of the baseline model is done to justify the need for fine-tuning and to determine the performance improvement on the new composition of Beverage class. Results Meta-training and meta-testing results are presented in Fig. 2. The average validation accuracy during meta-training is 0.9356. Upon fine-tuning with a one-shot regime, the resulting average accuracy on the query set is 0.9165, whereas the five-shot regime resulted in a slightly higher performance of 0.9170. This concludes that a pretrained convolutional neural network can learn and generalize a novel juice from a small amount of data. On the other hand, the low zero-shot test results depict that, apart from apple juice, the network cannot generalize the novel juice samples to the Beverage class without requiring at least one sample of the novel juice. The fine-tuned network does not undergo significant catastrophic forgetting, likely due to the shallow network architecture. The current results suggest that a model pretrained with a combination of juices can detect a new juice upon fine-tuning with a significantly small number of samples. The collected data can be used in the future for performance optimization through compensation of the influence of dynamic conditions such as room air quality, temperature, and humidity and the juice temperature. References [1] P. Peng, X. Zhao, X. Pan, and W. Ye, “Gas Classification using Deep Convolutional Neural Networks,” Sensors, vol. 18, no. 2, p. 157, Jan. 2018, doi: 10.3390/s18010157. [2] X. Hu, Q. Liu, H. Cai, and F. Li, “Gas Recognition under Sensor Drift by using Deep Learning,” in Advances in Intelligent Systems and Computing, Springer Berlin Heidelberg, 2014, pp. 23–33. [3] M. Riu-Aumatell, M. Castellari, E. López-Tamames, S. Galassi, and S. Buxaderas, “Characterisation of Volatile Compounds of Fruit Juices and Nectars by HS/SPME and GC/MS,” Food Chemistry, vol. 87, no. 4, pp. 627–637, Oct. 2004, doi: 10.1016/j.foodchem.2003.12.033. [4] Q. Sun, Y. Liu, T. Chua and B. Schiele, “Meta-Transfer Learning for Few-Shot Learning,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 403-412, doi: 10.1109/CVPR.2019.00049. Figure 1
Metal oxide (MOX) sensors offer a low-cost solution to detect volatile organic compound (VOC) mixtures. However, their operation involves time-consuming heating cycles, leading to a slower data collection and data classification process. This work introduces a few-shot learning approach that promotes rapid classification. In this approach, a model trained on several base classes is fine-tuned to recognize a novel class using a small number (n = 5, 25, 50 and 75) of randomly selected novel class measurements/shots. The used dataset comprises MOX sensor measurements of four different juices (apple, orange, currant and multivitamin) and air, collected over 10-minute phases using a pulse heater signal. While high average accuracy of 82.46 is obtained for five-class classification using 75 shots, the model’s performance depends on the juice type. One-shot validation showed that not all measurements within a phase are representative, necessitating careful shot selection to achieve high classification accuracy. Error analysis revealed contamination of some measurements by the previously measured juice, a characteristic of MOX sensor data that is often overlooked and equivalent to mislabeling. Three strategies are adopted to overcome this: (E1) and (E2) fine-tuning after dropping initial/final measurements and the first half of each phase, respectively, (E3) pretraining with data from the second half of each phase. Results show that each of the strategies performs best for a specific number of shots. E3 results in the highest performance for five-shot learning (accuracy 63.69), whereas E2 yields the best results for 25-/50-shot learning (accuracies 79/87.1) and E1 predicts best for 75-shot learning (accuracy 88.6). Error analysis also showed that, for all strategies, more than 50% of air misclassifications resulted from contamination, but E1 was affected the least. This work demonstrates how strongly data quality can affect prediction performance, especially for few-shot classification methods, and that a data-centric approach can improve the results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.