Human activity recognition (HAR) is a prominent subfield of pervasive computing and also provides context of many applications such as healthcare, education, and entertainment. Most wearable HAR studies assume that sensing device placement and orientation are fixed and never change. However, this condition is actually not always guaranteed in the real scenario and recognition result is influenced by the distortion as consequence. To handle this, our work proposes a new model based on convolutional neural network to extract robust features which are invariant of device placement and orientation, to train machine learning classifiers. We first carry out experiments to show negative effects of this problem. Then, we apply the convolutional neural network-based hybrid structure on the HAR. Results show that our method provides 15% to 40% accuracy promotion on public data set and 10% to 20% promotion on our own data set, both with distortion.Trans Emerging Tel Tech. 2020;31:e3823.wileyonlinelibrary.com/journal/ett