Handwritten recognition has drawn profound attention since decades ago due to its numerous potential applications in real life. Research on unconstrained handwritten recognition in some languages has achieved attractive advancement, but it lags behind for Bengali even though it is the major language spoken by about 230 million people in the Indian subcontinent, and even the first and official language of Bangladesh. Recently, the use of convolutional neural network (CNN) has been reported with high accuracy in pattern recognition and computer vision problems. The main purpose of this study is to provide an architecture of a CNN to improve the accuracy of handwritten Bengali numerals recognition (HBNR) and compare its performance with the existing ones. We proposed a new CNN architecture, VGG-11M, which improves an existing one (VGG-11). The normalized and rescaled images of each numeral were augmented by different transformation operations to increase the training samples and to add diversity in the dataset. Then, the images were used to train the proposed VGG-11M model. The recognition accuracy of the developed system was tested on both training and test sets of three publicly available handwritten Bengali numerals database at different resolutions. Finally the performance of the model was compared with four other architectures (LeNet-5, ResNet-50, VGG-11, and VGG-16). The highest accuracy 99.80%, 99.66%, and 99.25% was obtained using the proposed architecture on the test set of ISI, CMATERDB, and NUMTADB dataset, respectively, at resolution 32 × 32. The proposed VGG-11M outperformed the existing architectures of CNN on HBNR.