We present a bagging ensemble of convolutional networks in combination with the test-time augmentation technique to improve performance on the cross-dataset gender recognition problem. The bagging ensemble combines the predictions from multiple homogeneous models into the ensemble prediction. Augmentation techniques are often used in the learning phase of the CNNs to improve the generalization ability. On the other hand, test-time augmentation is not a common method used in the testing phase of the learned model. We conducted experiments on models trained using different hyperparameters. We augmented the test data and combine the predictive outputs from these network models. Experiments performed on diverse gender datasets, including Adience, AFAD, CelebA, Gallagher, Genki-4K, IMDb, LFW, Morph, VGGFace2, and Wiki, showed that the use of bagging ensemble of convolutional networks and test-time augmentation outperforms standalone models. We obtained the highest cross-dataset accuracy in the literature on seven out of eleven datasets. For the remaining four datasets we reported the cross-dataset results for the first time.According to our experiments, VGGFace2, IMDb, and CelebA datasets provided the highest cross-dataset classification results for most of the test datasets in the gender recognition problem.