Abstract. In this paper, a 2D to 3D stereo image conversion scheme is proposed for 3D content creation. The difficulty in this problem lies on depth estimation/assignment from a mono image, which actually does not have sufficient information. To estimate the depth map, we adopt a strategy of performing foreground/background separation first, then classifying a background depth profile by neural network, estimating foreground depth from image cues, and finally combining them. To enhance stereoscopic perception for the synthesized images viewed on 3D display, depth refinement based on bilateral filter and HVS-based contrast modification between the foreground and background are adopted. Subjective experiments show that the stereo images generated by using the proposed scheme can provide good 3D perception.Keywords: 2D to 3D image conversion, background depth profile, stereoscopic perception, depth cue estimation.
Introduction3D video applications, such as 3D multimedia, 3DTV broadcasting, and 3D gaming, are getting more popular due to an incredible viewing experience compared with 2D video. Among them, the 3D digital frame is promising in near future's consumer electronics products. Nowadays in the market, its LCD panel has been manufactured in a size of 7 inches that can be viewed without glasses (i.e., naked eye). A traditional 2D color image, either raw or decoded data, can then be converted into a left-and-right or a multi-channel format so as to be viewed on 3D displays.To have a capability of multi-view conversion, the depth information that originally does not exist in the 2D color images needs to be estimated. Then, the Depth Image 382 G.-S. Lin et al.Based Rendering (DIBR) technique can be used to render/synthesize stereo or multi-views. Currently, researchers have proposed several 2D to 3D conversion algorithms for static images [2,[13][14][15]17] and dynamic videos [4][5][6]12], aiming to mitigate the insufficiency of 3D contents. Due to less depth cues (e.g., motion) that can be found compared to video, images' 2D to 3D conversion is much more challenging.Recent researches about automatic depth estimation from 2D photographic images can be divided into two categories. The first one is depth from defocus/focus. S. K. Nayer and Y. Nakagawa [1] explored the relationship between the focus level and the object distance from the focused plane, called SFF (Shape from focus), to estimate the depths. This method demands multiple images captured with different focal lengths, which is beyond our discussion. V. P. Namboodiri and S. Chaudhuri [2] proposed a method to perceive the depth layers from a single defocused image, called DFD (Depth from defocus). They estimate the blurring degree of each pixel and use it for assigning the relative depth. The other category of 2D depth estimation is based on multiple depth cues. For example,in [13], Hough transform is used to detect the vanishing point as the geometric cue, by which an initial depth map can be constructed. The depth map is then refined based on the texture ...