Understanding the driver's cognitive load is important for evaluating in-vehicle user interfaces. This paper describes experiments to assess machine learning classification algorithms on their ability to automatically identify elevated cognitive workload levels in drivers, leading towards the development of robust tools for automobile user interface evaluation. We look at using both driver performance as well as physiological data. These measures can be collected in real-time and do not interfere with the primary task of driving the vehicle. We report classification accuracies of up to 90% for detecting elevated levels of cognitive load, and show that the inclusion of physiological data leads to higher classification accuracy than vehicle sensor data evaluated alone. Finally, we show results suggesting that models can be built to classify cognitive load across individuals, instead of building individual models for each person. By collecting data from drivers in two large field studies on the highway (20 drivers and 99 drivers), this work extends prior work and demonstrates feasibility and potential of such measures for HCI research in vehicles.