The COVID-19 pandemic has affected many aspects of our lives, including education. Due to this unexpected catastrophe, education has shifted to virtual-learning and auto-grading models in most parts of the world. This study explores the validity and appropriateness of auto-grading-assessment for online exams by comparing students’ online exam scores where they are first auto-graded and then manually graded. Furthermore, it investigates whether the mean differences in their scores are statistically significant. The study included two calculus courses taught by the authors, during the spring semester 2019-2020 at a private university in Saudi Arabia. The online exam was performed on the WebAssign platform, which has built-in calculus questions. The sample consisted of fifty-five students who were registered on those calculus courses. The quantitative data was analysed using the SPSS statistical tool. A paired t-test at an alpha level of 0.05 was performed on differences in mean exam scores between auto-graded and manually-graded scores. The statistical analysis results revealed a statistically significant difference in students' mean scores. Our findings illustrate the importance of human intelligence, its role in assessing students' achievements and understanding of mathematical concepts, and the extent to which instructors can currently rely on auto-grading. A careful manual investigation of auto-graded exams revealed different types of mistakes committed by students. Those mistakes were characterized into two categories: non-mathematical mistakes (related to Platform Design) and minor mathematical mistakes, which might deserve partial credit. The study indicated a need to reform the auto-grading system and provided some suggestions to overcome its setbacks.