2021
DOI: 10.48550/arxiv.2105.06458
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

High-Resolution Complex Scene Synthesis with Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 17 publications
0
3
0
Order By: Relevance
“…We condition the generation on the layout predicted from the SGTransformer. To this purpose, we encode the layout by mapping each object into a triplet of the form o = (c y , tl y , br y ), as done by [Jahn et al, 2021]. Here, c y represents the class index, tl y and br y the bounding box coordinates on a diagonal, top left and bottom right, respectively.…”
Section: Layout To Imagementioning
confidence: 99%
“…We condition the generation on the layout predicted from the SGTransformer. To this purpose, we encode the layout by mapping each object into a triplet of the form o = (c y , tl y , br y ), as done by [Jahn et al, 2021]. Here, c y represents the class index, tl y and br y the bounding box coordinates on a diagonal, top left and bottom right, respectively.…”
Section: Layout To Imagementioning
confidence: 99%
“…Our work complements GLIGEN and incorporates its grounding token design. Our work is also in line with image generation from layouts [31,25,26,10,13]. These techniques typically operate in a closed-world environment characterized by a fixed vocabulary and access to predefined layouts.…”
Section: Related Workmentioning
confidence: 77%
“…We compare with a wide range of baselines, including open-source and closed-source models. Open-sourced models include (1) GLIGEN [12]: a fine-tuned Stable Diffusion [10] Dataset. As mentioned before, our model is finetuned on COCO [14] images and captions with LVIS [8]'s instance annotations.…”
Section: Methodsmentioning
confidence: 99%