Deep learning has been widely used for implementing human activity recognition from wearable sensors like inertial measurement units. The performance of deep activity recognition is heavily affected by the amount and variability of the labeled data available for training the deep learning models. On the other hand, it is costly and time-consuming to collect and label data. Given limited training data, it is hard to maintain high performance across a wide range of subjects, due to the differences in the underlying data distribution of the training and the testing sets. In this work, we develop a novel solution that applies adversarial learning to improve cross-subject performance by generating training data that mimic artificial subjects-i.e. through data augmentation-and enforcing the activity classifier to ignore subject-dependent information. Contrary to domain adaptation methods, our solution does not utilize any data from subjects of the test set (or target domain). Furthermore, our solution is versatile as it can be utilized together with any deep neural network as the classifier. Considering the open dataset PAMAP2, nearly 10% higher crosssubject performance in terms of F1-score can be achieved when training a CNN-LSTM-based classifier with our solution. A performance gain of 5% is also observed when our solution is applied to a stateof-the-art HAR classifier composed of a combination of inception neural network and recurrent neural network. We also investigate different influencing factors of classification performance (i.e. selection of sensor modalities, sampling rates and the number of subjects in the training data), and summarize a practical guideline for implementing deep learning solutions for sensor-based human activity recognition.
Recent advances in deep learning have granted unrivaled performance to sensor-based human activity recognition (HAR). However, in a real-world scenario, the HAR solution is subject to diverse changes over time such as the need to learn new activity classes or variations in the data distribution of the already-included activities. To solve these issues, previous studies have tried to apply directly the continual learning methods borrowed from the computer vision domain, where it is vastly explored. Unfortunately, these methods either lead to surprisingly poor results or demand copious amounts of computational resources, which is infeasible for the low-cost resource-constrained devices utilized in HAR. In this paper, we provide a resource-efficient and high-performance continual learning solution for HAR. It consists of an expandable neural network trained with a replay-based method that utilizes a highly-compressed replay memory whose samples are selected to maximize data variability. Experiments with four open datasets, which were conducted on two distinct microcontrollers, show that our method is capable of achieving substantial accuracy improvements over baselines in continual learning such as Gradient Episodic Memory, while utilizing only one-third of the memory and being up to 3x faster.
Autonomous driving requires 3-D maps that provide accurate and up-to-date information about semantic landmarks. Since cameras present wider availability and lower cost compared with laser scanners, vision-based mapping solutions, especially, the ones using crowdsourced visual data, have attracted much attention from academia and industry. However, previous works have mainly focused on creating 3-D point clouds, leaving automatic change detection as open issue. We propose a pipeline for initiating and updating 3-D maps with dashcam videos, with a focus on automatic change detection based on comparison of metadata (e.g., the types and locations of traffic signs). To improve the performance of metadata generation, which depends on the accuracy of 3-D object detection and localization, we introduce a novel deep learning-based pixelwise 3-D localization algorithm. The algorithm, trained directly with Structure from Motion (SfM) point cloud data, accurately locates objects in 3-D space by estimating not only depth from monocular images but also lateral and height distances.In addition, we also propose a point clustering and thresholding algorithm to improve the robustness of the system to errors. We have performed experiments with different types of cameras, lighting, and weather conditions. The changes were detected with an average accuracy above 90%. The errors in the campus area were mainly due to traffic signs seen from a far distance to the vehicle and intended for pedestrians and cyclists only. We also conducted cause analysis of the detection and localization errors to measure the impact from the performance of the background technology in use.
Deep learning has permitted unprecedented performance in sensorbased human activity recognition (HAR). However, deep learning models often present high computational overheads, which poses challenges to their implementation on resource-constraint devices such as microcontrollers. Usually, the computational overhead increases with the input size. One way to reduce the input size is by constraining the number of sensor channels. We refer to sensor channel as a specific data modality (e.g. accelerometer) placed on a specific body location (e.g. chest). Identifying and removing irrelevant and redundant sensor channels is feasible via exhaustive search only in cases where few candidates exist. In this paper, we propose a smarter and more efficient way to optimize the sensor channel selection during the training of deep neural networks for HAR. Firstly, we propose a light-weight deep neural network architecture that learns to minimize the use of redundant and irrelevant information in the classification task, while achieving high performance. Secondly, we propose a sensor channel selection algorithm that utilizes the knowledge learned by the neural network to rank the sensor channels by their contribution to the classification task. The neural network is then trimmed by removing the sensor channels with the least contribution from the input and pruning the corresponding weights involved in processing them. The pipeline that consists of the above two steps iterates until the optimal set of sensor channels has been found to balance the trade-off between resource consumption and classification performance. Compared with other selection methods in the literature, experiments on 5 public datasets showed that our proposal achieved significantly higher F1-scores at the same time as utilizing from 76% to 93% less memory, with up to 75% faster inference time and as far as 76% lower energy consumption. CCS CONCEPTS• Human-centered computing → Ubiquitous and mobile computing; • Computing methodologies → Neural networks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.