“…For comparison purposes, all the neural networks used in our experiments employed the Adam [29] Aside from the two architectures described earlier, namely stacked-LSTM and dual-stream LSTM, we use a CNN, a Conv-LSTM and a Bi-LSTM proposed in HAR scenarios [24,22,17] for comparison. An optimization experiment was conducted to find the optimal hyper-parameter settings of these methods: i) For stacked-LSTM, three LSTM layers each with 32 hidden units followed by a dropout layer with probability of 0.5 are used; ii) For dual-stream LSTM, two sets of 3 LSTM layers each with 24 hidden units and 8 hidden units followed by dropout layer with probability of 0.5 are used respectively for the MoCap and sEMG streams; iii) For CNN [24], three convolutional layers each with 10 kernels of size 1 × 10 and followed by 1 × 2 max-pooling layer are used, and a softmax layer in the end is used for classification; iv) For Conv-LSTM [22], the architecture is used with 10 kernels of size 1 × 10 in each convolutional layer followed by max-pooling of size 1 × 2, and the number of hidden units of each LSTM layer is set to 32; v) For Bi-LSTM [17], 3 bidirectional LSTM layers with 16 hidden units in each are used.…”