Biased outcomes in machine learning models can arise due to various factors, including limited training data, imbalanced class distribution, suboptimal training methodologies, and overfitting. In training neural networks with Stochastic Gradient Descent (SGD) and backpropagation, the choice of hyperparameters like learning rate and momentum is crucial to influencing the model's performance. A comprehensive grid search study was conducted using static hyperparameters with standard SGD and dynamic hyperparameters with the Adam optimizer. The investigation focused on a multifaceted analysis across different tasks --classification, segmentation, and detection -and was applied to four image-based applications: digit classification using the MNIST dataset, tuberculosis detection from chest X-ray images, lung segmentation in chest X-ray images, and the detection of malaria parasites in blood smear images. In the first comparative study on the MNIST dataset, the SGD algorithm consistently outperformed the Adam algorithm as noise levels increased. SGD held a slight advantage in accuracy over Adam in a noise-free environment. This advantage became more apparent as noise was introduced and increased to moderate levels. At high noise levels, both algorithms experienced a significant decline in performance, yet SGD maintained a relatively better accuracy compared to Adam. This trend underscored SGD's superior ability to generalize across varying noise conditions. For TB detection in the second application, the DenseNet121 architecture was used, and it was found that Adam showed better performance on the larger TBX11K dataset. However, SGD outperformed Adam on a smaller subset of TBX11K. In the third comparative study for lung segmentation using COVID-19 and NLM datasets, SGD slightly outperformed Adam based on the mean and Hausdorff distances. In the case of malaria parasite detection using the YOLOv8 architecture, SGD and Adam optimizers showed varying performances across different conditions. Initially, both achieved high accuracies on the full dataset, with Adam slightly outperforming SGD. However, with decreasing dataset size, SGD maintained more consistent performance, while Adam's accuracy fluctuated significantly. In noise tests, both showed equal accuracy on clean data, but under noise conditions, SGD maintained higher accuracies, suggesting better generalization capabilities. Further experiments assessed how well SGD and Adam could cope with domain shifts, mainly when using data from different countries. SGD's generalization performance was superior to Adam's performance in these experiments. In summary, although Adam is arguably the more popular of the two optimization techniques, SGD held its own in the experiments and showed superior performance when it came to generalization or classifying in noisy conditions.