The ability of Graphical Processor Units (GPUs) to quickly train dataand compute-intensive deep networks has led to rapid advancements across diverse domains such as robotics, medical imaging and autonomous driving. However, memory constraints with GPU-based training for memory-intensive deep networks have forced researchers to adopt various workarounds: 1) resize the input image, 2) divide input image into smaller patches, or use smaller batch-sizes in order to fit both the model and batch training data into GPU memory.While these alternatives perform well when dealing with natural images, they suffer from 1) loss of highresolution information, 2) loss of global context and 3) sub-optimal batch sizes. Such issues will likely to become more pressing for domains like medical imaging, where data is scarce and images are often of very high resolution with subtle features. Therefore, in this paper, we demonstrate that training can be made more data-efficient by using a distributed training setup with high-resolution images and larger effective batch sizes, with batches being distributed across multiple nodes. The distributed GPU training framework, which partitions the data and only shares model parameters across different GPUs, gets around the memory constraints of single GPU training. We conduct a study in which experiments are performed for different image resolutions (ranging from 112 × 112 to 1024 × 1024) and different number of images per class to determine the effect of image resolutions on network performance. We illustrate our findings on two medical imaging datasets namely, SD-198 skin-lesion and NIH Chest X-rays.