Continuous human activity recognition from inertial signals is performed by splitting these temporal signals into time windows and identifying the activity in each window. Defining the appropriate window duration has been the target of several previous works. In most of these analyses, the recognition performance increases with the windows duration until an optimal value and decreases or saturates for longer windows. This paper evaluates several strategies to combine sub-window information inside a window, obtaining important improvements for long windows. This evaluation was performed using a state-of-the-art human activity recognition system based on Convolutional Neural Networks (CNNs). This deep neural network includes convolutional layers to learn features from signal spectra and additional fully connected layers to classify the activity at each window. All the analyses were carried out using two public datasets (PAMAP2 and USC-HAD) and a Leave-One-Subject-Out (LOSO) cross-validation. For 10-s windows, the accuracy increased from 90.1 (± 0.66) to 94.27 (± 0.46) in PAMAP2 and from 80.54 (± 0.73) to 84.46 (± 0.67) in USC-HAD. For 20-s windows, the improvements were from 92.66 (± 0.58) to 96.35 (± 0.38) (PAMAP2) and from 78.39 (± 0.76) to 86.36 (± 0.57) (USC-HAD).