“…In learning rate regime, we used batch size 64 while using the learning rates [0.003, 0.002, 0.001, 0.0003, 0.0001, 0.00003, 0.00001]. In batch size regime, we used learning rate 0.0001 and batch sizes [8,16,32,64,128,256,512]. Cross entropy loss was used, with ADAM optimizer (β 1 = 0.9, β 2 = 0.999, = 1e − 08).…”