SIGNet: Semantic Instance Aided Unsupervised 3D Geometry Perception

Yue, Meng; Lu, Yongxi; Raj, Aman; Sunarjo, Samuel; Guo, Rui; Javidi, Tara; Bansal, Gaurav; Bharadia, Dinesh

doi:10.1109/cvpr.2019.01004

Cited by 60 publications

(29 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Depth estimation as used in this work has been shown to take and give profit through MTL with other tasks as, e.g., semantic segmentation [44], [47], [48], domain adaptation [7], optical flow estimation [49], [50], or 3D pose estimation [4], [24]. Particularly, self-supervised depth estimation has been combined with semantic segmentation [46], [51]- [53] or instance segmentation [36], [54] to mitigate the effect of moving objects, which violate the static world assumption made during training of such models. Also consistency checks for both tasks [45] or unidirectional feature representation improvements [51] have been proven successful.…”

Section: B Multi-task Learningmentioning

confidence: 99%

Online Performance Prediction of Perception DNNs by Multi-Task Learning With Depth Estimation

Klingner

Fingscheidt

2021

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

Online performance prediction (or: observation) of deep neural networks (DNNs) in highly automated driving presents an unsolved task until now, as most DNNs are evaluated offline requiring datasets with ground truth labels. In practice, however, DNN performance depends on the used camera type, lighting and weather conditions, and on various other kinds of domain shift. Also, the input to DNN-based perception systems can be perturbed by adversarial attacks requiring means to detect these input perturbations. In this work we propose a method to mitigate these problems by a multi-task learning approach with monocular depth estimation as a secondary task, which enables us to predict the DNN's performance for various other (primary) tasks by evaluating only the depth estimation task with a physical depth measurement provided, e.g., by a LiDAR sensor. We show the effectiveness of our method for the primary task of semantic segmentation using various training datasets, test datasets, model architectures, and input perturbations. Our method provides an effective way to predict (observe) the performance of DNNs for semantic segmentation even on a single-image basis and is transferable to other primary DNN-based perception tasks in a straightforward manner.

show abstract

Section: B Multi-task Learningmentioning

confidence: 99%

Online Performance Prediction of Perception DNNs by Multi-Task Learning With Depth Estimation

Klingner

Fingscheidt

2021

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

show abstract

“…While this approach can considerably improve the depth estimation performance, it incurs significantly more computation. Other works include new loss functions during training, either via multi-task training [35] or by enforcing segmentation consistency between the warped and real images [3,24,39]. These methods do not require extra semantic computation during test, but require running a semantic network at every training iteration, which still generates a considerable overhead.…”

Section: Related Workmentioning

confidence: 99%

X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task Distillation

Cai¹,

Matai²,

Borse³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we propose a novel method, X-Distill, to improve the self-supervised training of monocular depth via cross-task knowledge distillation from semantic segmentation to depth estimation. More specifically, during training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network. In order to enable such knowledge distillation across two different visual tasks, we introduce a small, trainable network that translates the predicted depth map to a semantic segmentation map, which can then be supervised by the teacher network. In this way, this small network enables the backpropagation from the semantic segmentation teacher's supervision to the depth network during training. In addition, since the commonly used object classes in semantic segmentation are not directly transferable to depth, we study the visual and geometric characteristics of the objects and design a new way of grouping them that can be shared by both tasks. It is noteworthy that our approach only modifies the training process and does not incur additional computation during inference. We extensively evaluate the efficacy of our proposed approach on the standard KITTI benchmark and compare it with the latest state of the art. We further test the generalizability of our approach on Make3D. Overall, the results show that our approach significantly improves the depth estimation accuracy and outperforms the state of the art.

show abstract

“…Although vision-based direct methods [50], [51] that work with all the raw pixel information in images have better performance in dealing with textureless scenes, they require high computing power (GPUs) to achieve real-time processing, which is unavailable for payloadlimited MAVs. In addition, deep learning based approaches [52], [53] that learn the mapping between the state and the images are insensitive to light conditions and texture. They either require a labor-intensive site survey to label data for supervised learning, or suffer from inferior performance due to the risk of overfitting.…”

Section: Related Workmentioning

confidence: 99%

LoRa Backscatter Assisted State Estimator for Micro Aerial Vehicles with Online Initialization

Zhang¹,

Wang²,

Zhang³

et al. 2021

IEEE Trans. on Mobile Comput.

View full text Add to dashboard Cite

The advances in agile micro aerial vehicles (MAVs) have shown great potential in replacing humans for labor-intensive or dangerous indoor investigation, such as warehouse management and fire rescue. However, the design of a state estimation system that enables autonomous flight poses fundamental challenges in such dim or smoky environments. Current dominated computer-vision based solutions only work in well-lighted texture-rich environments. This paper addresses the challenge by proposing Marvel, an RF backscatter-based state estimation system with online initialization and calibration. Marvel is nonintrusive to commercial MAVs by attaching backscatter tags to their landing gears without internal hardware modifications, and works in a plug-and-play fashion with an automatic initialization module. Marvel is enabled by three new designs, a backscatter-based pose sensing module, an online initialization and calibration module, and a backscatter-inertial super-accuracy state estimation algorithm. We demonstrate our design by programming a commercial MAV to autonomously fly in different trajectories. The results show that Marvel supports navigation within a range of 50 m or through three concrete walls, with an accuracy of 34 cm for localization and 4.99 • for orientation estimation. We further demonstrate our online initialization and calibration by comparing to the perfect initial parameter measurements from burdensome manual operations.

show abstract

SIGNet: Semantic Instance Aided Unsupervised 3D Geometry Perception

Cited by 60 publications

References 51 publications

Online Performance Prediction of Perception DNNs by Multi-Task Learning With Depth Estimation

Online Performance Prediction of Perception DNNs by Multi-Task Learning With Depth Estimation

X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task Distillation

LoRa Backscatter Assisted State Estimator for Micro Aerial Vehicles with Online Initialization

Contact Info

Product

Resources

About