LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation

Zhao, Lin; Zhou, Hui; Zhu, Xinge; Song, Xiao; Li, Hongsheng; Tao, Wenbing

doi:10.48550/arxiv.2108.07511

Cited by 7 publications

(7 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Fusion has been studied for a number of LiDAR based 3D perception tasks in a supervised and weakly-supervised manner [1,4,18,21,41,42]. For LiDAR semantic segmentation PMF [44] and LIF-Seg [40] fuse the information from streams that process each modality individually to obtain higher information yielding features. However such approaches not only require image information during inference but also have linearly increasing memory and computation cost.…”

Section: Related Workmentioning

confidence: 99%

Scribble-Supervised LiDAR Semantic Segmentation

Unal

Dai

Gool

2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Densely annotating LiDAR point clouds remains too expensive and time-consuming to keep up with the ever growing volume of data. While current literature focuses on fully-supervised performance, developing efficient methods that take advantage of realistic weak supervision have yet to be explored. In this paper, we propose using scribbles to annotate LiDAR point clouds and release ScribbleKITTI, the first scribble-annotated dataset for LiDAR semantic segmentation. Furthermore, we present a pipeline to reduce the performance gap that arises when using such weak annotations. Our pipeline comprises of three stand-alone contributions that can be combined with any LiDAR semantic segmentation model to achieve up to 95.7% of the fully-supervised performance while using only 8% labeled points. Our scribble annotations and code are available at github.com/ouenal/scribblekitti.

show abstract

Section: Related Workmentioning

confidence: 99%

Scribble-Supervised LiDAR Semantic Segmentation

Unal

Dai

Gool

2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…LIF-Seg by Zhao et al [24] improves upon the LiDAR segmentation network Cylinder3D [25] through early-and middle-fusion with color images. Image patches around the projected points provide per-point color context for earlyfusion, while mid-fusion concatenates semantic features from LiDAR and image, processed with Cylinder3D and DeepLab v3+, respectively, before processing with an additional refinement sub-network based on Cylinder3D for final semantic labels.…”

Section: Related Workmentioning

confidence: 99%

Real-time multi-modal semantic fusion on unmanned aerial vehicles with label propagation for cross-domain adaptation

Bultmann

Quenzel²,

Behnke³

2023

Robotics and Autonomous Systems

View full text Add to dashboard Cite

“…It is aimed at establishing correspondences between instances from different modalities [3], either spatial or semantic. To achieve this, many methods use camera intrinsics to correspond spatial position of pixels and points, then align per pixel-point feature or fuse the raw [13,34,47] Image/Video 2D Heatmap Grounding [11,44,69] Point Cloud 3D Heatmap Grounding data [25,62,65,71,77,80]. Some works utilize depth information to project image features into 3D space and then fuse them with point-wise features [20,42,73,75,76].…”

Section: Image-point Cloud Cross-modal Learningmentioning

confidence: 99%

Grounding 3D Object Affordance from 2D Interactions in Images

Yang¹,

Zhai²,

Luo³

et al. 2023

Preprint

View full text Add to dashboard Cite

Grounding 3D object affordance seeks to locate objects' "action possibilities" regions in the 3D space, which serves as a link between perception and operation for embodied agents. Existing studies primarily focus on connecting visual affordances with geometry structures, e.g. relying on annotations to declare interactive regions of interest on the object and establishing a mapping between the regions and affordances. However, the essence of learning object affordance is to understand how to use it, and the manner that detaches interactions is limited in generalization. Normally, humans possess the ability to perceive object affordances in the physical world through demonstration images or videos. Motivated by this, we introduce a novel task setting: grounding 3D object affordance from 2D interactions in images, which faces the challenge of anticipating affordance through interactions of different sources. To address this problem, we devise a novel Interaction-driven 3D Affordance Grounding Network (IAG), which aligns the region feature of objects from different sources and models the interactive contexts for 3D object affordance grounding. Besides, we collect a Point-Image Affordance Dataset (PIAD) to support the proposed task. Comprehensive experiments on PIAD demonstrate the reliability of the proposed task and the superiority of our method. The project is available at https://github.com/yyvhang/IAGNet.

show abstract

LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation

Cited by 7 publications

References 47 publications

Scribble-Supervised LiDAR Semantic Segmentation

Scribble-Supervised LiDAR Semantic Segmentation

Real-time multi-modal semantic fusion on unmanned aerial vehicles with label propagation for cross-domain adaptation

Grounding 3D Object Affordance from 2D Interactions in Images

Contact Info

Product

Resources

About