DADA: Depth-Aware Domain Adaptation in Semantic Segmentation

Vu, Tuan-Hung; Jain, Himalaya; Bucher, Maxime; Cord, Matthieu; Pérez, Patrick

doi:10.1109/iccv.2019.00746

Cited by 184 publications

(157 citation statements)

References 38 publications

(65 reference statements)

Supporting

Mentioning

155

Contrasting

Order By: Relevance

“…Multi-task learning (MTL) is defined as the joint learning process of several tasks at once by either learning a shared feature representation [43], [44], or by implementing cross-task consistency checks into the training process [45], [46]. Depth estimation as used in this work has been shown to take and give profit through MTL with other tasks as, e.g., semantic segmentation [44], [47], [48], domain adaptation [7], optical flow estimation [49], [50], or 3D pose estimation [4], [24]. Particularly, self-supervised depth estimation has been combined with semantic segmentation [46], [51]- [53] or instance segmentation [36], [54] to mitigate the effect of moving objects, which violate the static world assumption made during training of such models.…”

Section: B Multi-task Learningmentioning

confidence: 99%

“…which considers the number of true positives (T P s ), false negatives (F N s ), and false positives (F P s ) between the estimated segmentation mask m and the ground truth segmentation mask m for each class s. Note that T P s , F N s and F P s are calculated over an entire test set and only afterwards the mIoU is obtained according to (7). For online performance observation, however, the segmentation performance has to be predicted and evaluated on a single-image basis (image index n) in order to be real-time capable.…”

Section: Performance Evaluation Metricsmentioning

confidence: 99%

“…For online performance observation, however, the segmentation performance has to be predicted and evaluated on a single-image basis (image index n) in order to be real-time capable. Therefore, for the scope of this work, we calculate the mIoU metric imagewise (mIoU n ) according to (7), and afterwards the average value of the mIoU n over all images is taken as mIoU . This makes it possible to predict and evaluate the semantic segmentation performance on a single image 1 x, without changing the interpretation of the metric when reporting the overall performance on a whole test set.…”

Section: Performance Evaluation Metricsmentioning

confidence: 99%

“…For example, when the predicted performance is too low, the high-level planning could decide not to trust the current information from the environment perception. This is of special importance when considering that in practice the DNN performance is often very sensitive to changes of the environment, which have not been included into the dataset, the neural network was trained on [7], [8]. Such changes include, e.g., a different camera type, different lighting or weather conditions, various other kinds of domain shift, or even directed adversarial attacks [9], [10], which are difficult to detect on the input image.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Online Performance Prediction of Perception DNNs by Multi-Task Learning With Depth Estimation

Klingner

Fingscheidt

2021

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

Online performance prediction (or: observation) of deep neural networks (DNNs) in highly automated driving presents an unsolved task until now, as most DNNs are evaluated offline requiring datasets with ground truth labels. In practice, however, DNN performance depends on the used camera type, lighting and weather conditions, and on various other kinds of domain shift. Also, the input to DNN-based perception systems can be perturbed by adversarial attacks requiring means to detect these input perturbations. In this work we propose a method to mitigate these problems by a multi-task learning approach with monocular depth estimation as a secondary task, which enables us to predict the DNN's performance for various other (primary) tasks by evaluating only the depth estimation task with a physical depth measurement provided, e.g., by a LiDAR sensor. We show the effectiveness of our method for the primary task of semantic segmentation using various training datasets, test datasets, model architectures, and input perturbations. Our method provides an effective way to predict (observe) the performance of DNNs for semantic segmentation even on a single-image basis and is transferable to other primary DNN-based perception tasks in a straightforward manner.

show abstract

Section: B Multi-task Learningmentioning

confidence: 99%

Section: Performance Evaluation Metricsmentioning

confidence: 99%

Section: Performance Evaluation Metricsmentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

Online Performance Prediction of Perception DNNs by Multi-Task Learning With Depth Estimation

Klingner

Fingscheidt

2021

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

show abstract

“…There are also recent improvements in language understanding with Bidirectional Encoder Representations from Transformers (BERT) [9] and A Robustly Optimized BERT Pretraining Approach (RoBERTa) [10], and, adding to that, the recent breakthrough in task agnostic transfer learning by Howard et al [11]. In CV, DL has advanced, inter alia, the tasks of image classification [12,13], object-detection [14][15][16], object-tracking [17], pose estimation [18][19][20][21], superresolution [22], and semantic segmentation [23][24][25][26][27][28]. These advancements give rise to new applications in, e.g., solid-state materials science and chemical sciences [29,30], meteorology [31], medicine [32][33][34][35][36][37][38][39], seismology [40][41][42], biology [43], life sciences in general [44], chemistry [45], and physics [46][47][48][49][50][51][52]…”

Section: Introductionmentioning

confidence: 99%

Generalized Sparse Convolutional Neural Networks for Semantic Segmentation of Point Clouds Derived from Tri-Stereo Satellite Imagery

et al. 2020

View full text Add to dashboard Cite

We studied the applicability of point clouds derived from tri-stereo satellite imagery for semantic segmentation for generalized sparse convolutional neural networks by the example of an Austrian study area. We examined, in particular, if the distorted geometric information, in addition to color, influences the performance of segmenting clutter, roads, buildings, trees, and vehicles. In this regard, we trained a fully convolutional neural network that uses generalized sparse convolution one time solely on 3D geometric information (i.e., 3D point cloud derived by dense image matching), and twice on 3D geometric as well as color information. In the first experiment, we did not use class weights, whereas in the second we did. We compared the results with a fully convolutional neural network that was trained on a 2D orthophoto, and a decision tree that was once trained on hand-crafted 3D geometric features, and once trained on hand-crafted 3D geometric as well as color features. The decision tree using hand-crafted features has been successfully applied to aerial laser scanning data in the literature. Hence, we compared our main interest of study, a representation learning technique, with another representation learning technique, and a non-representation learning technique. Our study area is located in Waldviertel, a region in Lower Austria. The territory is a hilly region covered mainly by forests, agriculture, and grasslands. Our classes of interest are heavily unbalanced. However, we did not use any data augmentation techniques to counter overfitting. For our study area, we reported that geometric and color information only improves the performance of the Generalized Sparse Convolutional Neural Network (GSCNN) on the dominant class, which leads to a higher overall performance in our case. We also found that training the network with median class weighting partially reverts the effects of adding color. The network also started to learn the classes with lower occurrences. The fully convolutional neural network that was trained on the 2D orthophoto generally outperforms the other two with a kappa score of over 90% and an average per class accuracy of 61%. However, the decision tree trained on colors and hand-crafted geometric features has a 2% higher accuracy for roads.

show abstract

Unsupervised Domain Adaptation for Semantic Segmentation of NIR Images Through Generative Latent Search

Pandey

Tyagi

Ambekar

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

DADA: Depth-Aware Domain Adaptation in Semantic Segmentation

Cited by 184 publications

References 38 publications

Online Performance Prediction of Perception DNNs by Multi-Task Learning With Depth Estimation

Online Performance Prediction of Perception DNNs by Multi-Task Learning With Depth Estimation

Generalized Sparse Convolutional Neural Networks for Semantic Segmentation of Point Clouds Derived from Tri-Stereo Satellite Imagery

Unsupervised Domain Adaptation for Semantic Segmentation of NIR Images Through Generative Latent Search

Contact Info

Product

Resources

About