MessyTable: Instance Association in Multiple Camera Views

Cai, Zhongang; Junzhe, Zhang; Ren, Daxuan; Yu, Cunjun; Zhao, Haiyu; Yi, Shuai; Yeo, Chai Kiat; Loy, Chen Change

doi:10.1007/978-3-030-58621-8_1

Cited by 10 publications

(6 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We additionally compare with (Appearance Only), or the Hungarian algorithm with thresholding on the appearance embedding distances. This approach outperformed other methods like [5] and [42] in [24].…”

Section: Input Views Proposed Sparse Planesmentioning

confidence: 81%

“…We evaluate correspondence separately. We follow Cai et al [5] and use IPAA-X, or the fraction of image pairs with no less than X% of planes associated correctly. We use ground-truth plane boxes in this setting since otherwise this metric measures both plane detection and plane correspondence.…”

Section: Methodsmentioning

confidence: 99%

“…[24]). For correspondence, we additionally report the appearance feature only baseline, which outperforms [42,5]. For camera pose estimation, we report the Camera Branch (Camera) of [24], which outperforms multiple other baselines such as [44,42].…”

Section: Wide Baseline Multiview Casementioning

confidence: 99%

See 2 more Smart Citations

PlaneFormers: From Sparse View Planes to 3D Reconstruction

Agarwala¹,

Jin²,

Rockwell³

et al. 2022

Preprint

View full text Add to dashboard Cite

We present an approach for the planar surface reconstruction of a scene from images with limited overlap. This reconstruction task is challenging since it requires jointly reasoning about single image 3D reconstruction, correspondence between images, and the relative camera pose between images. Past work has proposed optimization-based approaches. We introduce a simpler approach, the PlaneFormer, that uses a transformer applied to 3D-aware plane tokens to perform 3D reasoning. Our experiments show that our approach is substantially more effective than prior work, and that several 3D-specific design decisions are crucial for its success. Project page: https://samiragarwala.github.io/ PlaneFormers. PlaneFormer Extracted Planes Extracted Cameras Refined Cameras Refined PlanesFig. 1. Given a sparse set of images, our method detects planes and cameras, and produces plane correspondences and refined cameras using a Plane Transformer ), from which it can reconstruct the scene in 3D.

show abstract

Section: Input Views Proposed Sparse Planesmentioning

confidence: 81%

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

PlaneFormers: From Sparse View Planes to 3D Reconstruction

Agarwala¹,

Jin²,

Rockwell³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…• Camera Angle. Recent studies [4] have shown the enormous impact of camera angles on model performance, yet its effect in 3D human recovery is under-explored due to the unavailability of such a dataset. It is common to have datasets with fixed camera positions [18,22,40,56], where the only variation comes from the relative positions of the camera and the subjects.…”

Section: Diverse Data Generationmentioning

confidence: 99%

Playing for 3D Human Recovery

Cai¹,

Zhang²,

Ren³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

2 SenseTime Research https://gta-human.com Figure 1. GTA-Human dataset is built from GTA-V, an open-world action game that features a reasonably realistic functioning metropolis and virtual characters living in it. Our customized toolchain enables large-scale collection and annotation of highly diverse human data (subjects, actions, locations) that we hope empowers in-depth studies on 3D human recovery. We show here a few examples we generate at various locations in the virtual world with our SMPL annotations overlaid on them.

show abstract

“…Compared with Protocol 1, we observe that training with fewer actions and testing on unseen actions degrade the precision significantly, especially for crossevaluation on the whole body category which seems to have a large action distribution misalignment with the other two categories. Furthermore, deep learning models are sensitive to viewing angles[8,64], we thus report results of crossview (P3) in Table9. When the model is only trained on one view (i.e., View 0), we observe a considerable domain gap across different views as the errors increase as the deviation from the test view from the training view increases.…”

mentioning

confidence: 99%

HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling

Cai¹,

Ren²,

Zeng³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Point Cloud c) Keypoints d) SMPL e) Mesh f) Texture a) Color ImageFigure 1. HuMMan features multiple modalities of data format and annotations. We demonstrate a) color image, b) point cloud, c) keypoints, d) SMPL parameters and e) mesh geometry with f) texture. Each sequence is also annotated with an action label from 500 actions. Each subject has two additional high-resolution scans of naturally and minimally clothed body.

show abstract

MessyTable: Instance Association in Multiple Camera Views

Cited by 10 publications

References 29 publications

PlaneFormers: From Sparse View Planes to 3D Reconstruction

PlaneFormers: From Sparse View Planes to 3D Reconstruction

Playing for 3D Human Recovery

HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling

Contact Info

Product

Resources

About