The respiratory rate is an important vital parameter that provides information about persons' physical condition. In clinical practice it is currently only monitored using contact-based techniques, which can have negative effects on patients. In this study, a new algorithm for remote respiratory rate recognition is presented using photoplethysmographic signals derived from facial video images in the visible light spectrum. The effects of different implementation steps in the presented algorithm are investigated in order to optimize the approach and gain new findings in this research field. In addition, a detailed examination of already implemented procedures is performed and the results are compared on two different databases. We show that by fusing the results of seven different respiratory-induced modulations in combination with other processing steps, very good estimates for the respiratory rate on both moving and non-moving data are achieved. The obtained detection rates of 72.16 % and 87.68 % are significantly higher than those of the best comparison algorithm with 37.37 % and 59.13 %. The comparison algorithms developed so far are not competitive with the newly designed method, especially for video recordings involving persons in motion. This paper provides important new findings in the field of facial video-based respiratory rate recognition for the research community. A new method has been created that delivers significantly better estimates of the respiratory rate than previously developed techniques.
Pain is a reliable indicator of health issues; it affects patients’ quality of life when not well managed. The current methods in the clinical application undergo biases and errors; moreover, such methods do not facilitate continuous pain monitoring. For this purpose, the recent methodologies in automatic pain assessment were introduced, which demonstrated the possibility for objectively and robustly measuring and monitoring pain when using behavioral cues and physiological signals. This paper focuses on introducing a reliable automatic system for continuous monitoring of pain intensity by analyzing behavioral cues, such as facial expressions and audio, and physiological signals, such as electrocardiogram (ECG), electromyogram (EMG), and electrodermal activity (EDA) from the X-ITE Pain Dataset. Several experiments were conducted with 11 datasets regarding classification and regression; these datasets were obtained from the database to reduce the impact of the imbalanced database problem. With each single modality (Uni-modality) experiment, we used a Random Forest [RF] baseline method, a Long Short-Term Memory (LSTM) method, and a LSTM using a sample weighting method (called LSTM-SW). Further, LSTM and LSTM-SW were used with fused modalities (two modalities = Bi-modality and all modalities = Multi-modality) experiments. Sample weighting was used to downweight misclassified samples during training to improve the performance. The experiments’ results confirmed that regression is better than classification with imbalanced datasets, EDA is the best single modality, and fused modalities improved the performance significantly over the single modality in 10 out of 11 datasets.
Face and person detection are important tasks in computer vision, as they represent the first component in many recognition systems, such as face recognition, facial expression analysis, body pose estimation, face attribute detection, or human action recognition. Thereby, their detection rate and runtime are crucial for the performance of the overall system. In this paper, we combine both face and person detection in one framework with the goal of reaching a detection performance that is competitive to the state of the art of lightweight object-specific networks while maintaining real-time processing speed for both detection tasks together. In order to combine face and person detection in one network, we applied multi-task learning. The difficulty lies in the fact that no datasets are available that contain both face as well as person annotations. Since we did not have the resources to manually annotate the datasets, as it is very time-consuming and automatic generation of ground truths results in annotations of poor quality, we solve this issue algorithmically by applying a special training procedure and network architecture without the need of creating new labels. Our newly developed method called Simultaneous Face and Person Detection (SFPD) is able to detect persons and faces with 40 frames per second. Because of this good trade-off between detection performance and inference time, SFPD represents a useful and valuable real-time framework especially for a multitude of real-world applications such as, e.g., human–robot interaction.
Micro-expression is the involuntary emotion of the human that reflects the genuine feelings that cannot be hidden. Micro-expression is exhibited by facial expressions that last for a short duration and have very low intensity. Because of these reasons, micro-expression recognition is a challenging task. Recent research on the application of 3D convolutional neural networks (CNNs) has gained much popularity for video-based micro-expression analysis. For this purpose, both spatial as well as temporal features are of great importance to achieve high accuracies. The real possibly suppressed emotions of a person are valuable information for a variety of applications, such as in security, psychology, neuroscience, medicine and many other disciplines. This paper proposes a 3D CNN model architecture which is able to extract spatial and temporal features simultaneously. Thereby, the selection of the frame sequence plays a crucial role, since the emotions are only distinctive in a subset of the frames. Thus, we employ a novel pre-processing technique to select the Apex frame sequence from the entire video, where the timestamp of the most pronounced emotion is centered within this sequence. After an extensive evaluation including many experiments, the results show that the train–test split evaluation is biased toward a particular split and cannot be recommended in case of small and imbalanced datasets. Instead, a stratified K-fold evaluation technique is utilized to evaluate the model, which proves to be much more appropriate when using the three benchmark datasets CASME II, SMIC, and SAMM. Moreover, intra-dataset as well as cross-dataset evaluations were conducted in a total of eight different scenarios. For comparison purposes, two networks from the state of the art were reimplemented and compared with the presented architecture. In stratified K-fold evaluation, our proposed model outperforms both reimplemented state-of-the-art methods in seven out of eight evaluation scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.