2022
DOI: 10.1007/978-3-031-19836-6_24
|View full text |Cite
|
Sign up to set email alerts
|

Relationformer: A Unified Framework for Image-to-Graph Generation

Abstract: A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which prevents simultaneous object-relation interaction. This work proposes a unified onestage transformer-based framework,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(2 citation statements)
references
References 60 publications
0
2
0
Order By: Relevance
“…Road Graph Representations. There have been plenty of studies for vector road mapping, mainly relying on either the rasterized road map or the keypoint/vertex-based graph representations, and derived two categories, the segmentationbased (Máttyus, Luo, and Urtasun 2017;Zhou, Zhang, and Wu 2018;Mei et al 2021;Wang et al 2023;Batra et al 2019;Cheng et al 2021) and the keypoint-based approaches (He et al 2020;He, Garg, and Chowdhury 2022;Shit et al 2022;Yang et al 2023;Xie et al 2023). Regarding the popularity of end-to-end learning for better performance, the state-of-the-art approaches (He, Garg, and Chowdhury 2022;Xu et al 2023b) mainly learn keypoints (i.e., graph vertices) and the connectivity between vertices while using the rasterized road masks/maps as the additional supervision signals to enhance the feature representation ability of ConvNets.…”
Section: Related Workmentioning
confidence: 99%
“…Road Graph Representations. There have been plenty of studies for vector road mapping, mainly relying on either the rasterized road map or the keypoint/vertex-based graph representations, and derived two categories, the segmentationbased (Máttyus, Luo, and Urtasun 2017;Zhou, Zhang, and Wu 2018;Mei et al 2021;Wang et al 2023;Batra et al 2019;Cheng et al 2021) and the keypoint-based approaches (He et al 2020;He, Garg, and Chowdhury 2022;Shit et al 2022;Yang et al 2023;Xie et al 2023). Regarding the popularity of end-to-end learning for better performance, the state-of-the-art approaches (He, Garg, and Chowdhury 2022;Xu et al 2023b) mainly learn keypoints (i.e., graph vertices) and the connectivity between vertices while using the rasterized road masks/maps as the additional supervision signals to enhance the feature representation ability of ConvNets.…”
Section: Related Workmentioning
confidence: 99%
“…Second, in the encoder stage, We propose a novel combined CNN and ViT molecular image encoder network that is aware of both local atom information representation and long-range interatomic feature dependencies in the learning process. Third, in the decoder stage, we combine the Pix2seq and Relationformer architecture [14,15], using an autoregressive decoder to predict atoms and their coordinates as a sequence and predict the chemical bonds between the atoms, composing the 2D molecular graph. Finally, we include chemical knowledge as symbolic constraints to the model, such as determining the chirality of atoms from the predicted pattern.…”
mentioning
confidence: 99%