Vehicle Control on an Uninstrumented Surface With an Off-the-Shelf Smartwatch

Kim, Hyejoo; Lee, Haedeun; Park, Jinyoon; Paillat, Ludovic; Kim, Seung-Chan

doi:10.1109/tiv.2023.3236458

Cited by 3 publications

(2 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For training the datasets, we leveraged a set of recent deep neural networks, namely Conv1D [11,12], LSTM [13], and Transformer (i.e., self-attention-based classification model) [14]. Each of these models is adept at learning critical features from multivariate time-series data, significantly enhancing the accuracy of classifications and predictions [28][29][30]. The Conv1D and LSTM models are particularly proficient in processing sequence data, while the Transformer model, with its self-attention mechanism, has shown remarkable performance in various sequence understanding tasks.…”

Section: Network Architecture and Training Processmentioning

confidence: 99%

“…The selection of the number of LSTM layers and their respective sequence lengths T (ranging from 35 to 100) was guided by cross-validation to strike a balance between performance and computational efficiency. We stacked the recurrent cells twice (i.e., we used a stacked two-layer LSTM) with the number of recurrent units set to T, mirroring the length of the input signal x ∈ R T×D , following the approach adopted in previous studies [24,28]. The final recurrent hidden state then connects to a dense layer consisting of T units, functioning as a hidden layer within our network.…”

Section: Network Architecture and Training Processmentioning

confidence: 99%

See 1 more Smart Citation

Enhancing Robustness of Viewpoint Changes in 3D Skeleton-Based Human Action Recognition

Park

Kim

2023

Mathematics

Self Cite

View full text Add to dashboard Cite

Previous research on 3D skeleton-based human action recognition has frequently relied on a sequence-wise viewpoint normalization process, which adjusts the view directions of all segmented action sequences. This type of approach typically demonstrates robustness against variations in viewpoint found in short-term videos, a characteristic commonly encountered in public datasets. However, our preliminary investigation of complex action sequences, such as discussions or smoking, reveals its limitations in capturing the intricacies of such actions. To address these view-dependency issues, we propose a straightforward, yet effective, sequence-wise augmentation technique. This strategy enhances the robustness of action recognition models, particularly against changes in viewing direction that mainly occur within the horizontal plane (azimuth) by rotating human key points around either the z-axis or the spine vector, effectively creating variations in viewing directions. We scrutinize the robustness of this approach against real-world viewpoint variations through extensive empirical studies on multiple public datasets, including an additional set of custom action sequences. Despite the simplicity of our approach, our experimental results consistently yield improved action recognition accuracies. Compared to the sequence-wise viewpoint normalization method used with advanced deep learning models like Conv1D, LSTM, and Transformer, our approach showed a relative increase in accuracy of 34.42% for the z-axis and 10.86% for the spine vector.

show abstract