A problem faced by many instructors is that of designing exams that accurately assess the abilities of the students. Typically, these exams are prepared several days in advance, and generic question scores are used based on rough approximation of the question difficulty and length. For example, for a recent class taught by the author, there were 30 multiple choice questions worth 3 points, 15 true/false with explanation questions worth 4 points, and 5 analytical exercises worth 10 points. We describe a novel framework where algorithms from machine learning are used to modify the exam question weights in order to optimize the exam scores, using the overall final score as a proxy for a student's true ability. We show that significant error reduction can be obtained by our approach over standard weighting schemes, i.e., for the final and midterm exam, the mean absolute error for prediction decreases by 90.58% and 97.70% for linear regression approach respectively resulting in better estimation. We make several new observations regarding the properties of the "good" and "bad" exam questions that can have impact on the design of improved future evaluation methods.