Autonomous vehicles rely on perceiving the road environment through a perception pipeline fed by a variety of sensor modalities, including camera, lidar, radar, infrared, gated camera, and ultrasonic. The vehicle computer must perform sensor fusion reliably in various sensor signals degrading environments, which can be performed parametrically or using machine learning. We study sensor fusion of a forward-facing camera and a lidar sensor using convolutional neural networks (CNNs), for drivable area detection in winter driving, where the roads are partially covered with snow, tire tracks obscuring lane markers. Seven fusion models are developed and evaluated for their ability to classify pixels into two classes: drivable and nondrivable. A total of seven models are designed and tested, including camera only, lidar only, early fusion, (three types of) intermediate fusion, and late fusion. The models have accuracies between 84% and 89%, with runtimes between 23 and 61 ms. To select the best model from this group, we introduce a unique metric, named normalized accuracy runtime (NAR) score. A network in a group has a higher NAR score, the more accurate it is, and the shorter its runtime is. The models are evaluated on a winter driving DENSE dataset. Camera and lidar fog noise are synthesized to examine model robustness.