Machine learning (ML) based mortality prediction models can be immensely useful in intensive care units. Such a model should generate warnings to alert physicians when a patient's condition rapidly deteriorates or their vitals are in highly abnormal ranges. Before clinical deployment, it is important to comprehensively assess models' ability to recognize critical patient conditions. We develop testing approaches to systematically assess models' ability to respond to serious medical emergencies. Using generated test cases, we found that statistical machine-learning models trained solely from patient data are grossly insufficient and have many dangerous blind spots. Specifically, we identified serious deficiencies in the models' responsiveness, i.e., the inability to recognize severely impaired medical conditions or rapidly deteriorating health. For in-hospital mortality prediction, the models tested using our synthesized cases fail to recognize 66% of the test cases involving injuries. In some instances, the models fail to generate adequate mortality risk scores for all test cases. We also applied our testing methods to assess the responsiveness of 5-year breast and lung cancer prediction models and identified similar kinds of deficiencies.