Online performance prediction (or: observation) of deep neural networks (DNNs) in highly automated driving presents an unsolved task until now, as most DNNs are evaluated offline requiring datasets with ground truth labels. In practice, however, DNN performance depends on the used camera type, lighting and weather conditions, and on various other kinds of domain shift. Also, the input to DNN-based perception systems can be perturbed by adversarial attacks requiring means to detect these input perturbations. In this work we propose a method to mitigate these problems by a multi-task learning approach with monocular depth estimation as a secondary task, which enables us to predict the DNN's performance for various other (primary) tasks by evaluating only the depth estimation task with a physical depth measurement provided, e.g., by a LiDAR sensor. We show the effectiveness of our method for the primary task of semantic segmentation using various training datasets, test datasets, model architectures, and input perturbations. Our method provides an effective way to predict (observe) the performance of DNNs for semantic segmentation even on a single-image basis and is transferable to other primary DNN-based perception tasks in a straightforward manner.