“…Combining different sources of synthetic data with real data, and then generating a realistic composition of the both, has been successfully applied to various tasks, such as semisupervised foreground-background segmentation (Remez, Huang, and Brown 2018;Alhaija et al 2018;Dwibedi, Misra, and Hebert 2017), object detection (Dvornik, Mairal, and Schmid 2018;Dwibedi, Misra, and Hebert 2017) or 3d object pose estimation (Alhaija et al 2018). The two cut-and-paste methods (Dwibedi, Misra, and Hebert 2017;Remez, Huang, and Brown 2018) use simple blending techniques, and only for the foreground object, while we propose to learn a blending for both the background and foreground, accordingly. Note that while (Alhaija et al 2018) and (Dwibedi, Misra, and Hebert 2017) take the 3d geometry of the scene into account, they only consider 3d rigid objects.…”