In this paper we tackle the problem of unsupervised domain adaptation for the task of semantic segmentation, where we attempt to transfer the knowledge learned upon synthetic datasets with ground-truth labels to real-world images without any annotation. With the hypothesis that the structural content of images is the most informative and decisive factor to semantic segmentation and can be readily shared across domains, we propose a Domain Invariant Structure Extraction (DISE) framework to disentangle images into domain-invariant structure and domain-specific texture representations, which can further realize imagetranslation across domains and enable label transfer to improve segmentation performance. Extensive experiments verify the effectiveness of our proposed DISE model and demonstrate its superiority over several state-of-the-art approaches.
Based on increasing availability of capture and display devices dedicated to immersive media, coding, and transmission of these media has recently become a highest-priority subject of standardization. Different levels of immersiveness are defined with respect to an increasing degree of freedom in terms of movements of the observer within the immersive media scene. The level ranges from three degrees of freedom allowing the user to look around in all directions from a fixed point of view to six degrees of freedom, where the user can freely alter the viewpoint within the immersive media scene. The moving pictures experts group (MPEG) of ISO/IEC is developing a standards suite on "Coded Representation of Immersive Media," called MPEG-I, to provide technical solutions for building blocks of the media transmission chain, ranging from architecture, systems tools, coding of video and audio signals, to point clouds and timed text. In this paper, an overview on recent and ongoing standardization efforts in this area is presented. While some specifications, such as high efficiency video coding or version 1 of the omnidirectional media format, are already available, other activities are under development or in the exploration phase. This paper addresses the status of these efforts with a focus on video signals, indicates the development timelines, summarizes the main technical details, and provides pointers to further points of reference.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.