Abstract-Recent experiences with stereoscopic image/video conversion have sharply increased their demand. Although 3D stereoscopic view enhances visual quality compared to 2D, depth information which is required to generate 3D view is unavailable for existing 2D content. Therefore, there is a large requirement to generate depth information. This paper uses a fusion of monocular cues as Motion, Aerial Perspective cue (AP), Linear Perspective cue (LP), and Defocus cue to estimate the depth. The proposed system developed a mechanism to re-estimate depth map if the estimated depth map is inaccurate in a situation such as fast motion, false foreground estimation. This algorithm is tested in different conditions such as the sequence of camera motion and multiple objects, static cameras and stationary background, a highly dynamic foreground, background with less motion and when motion is behind the foreground. The experimental results show that generation of the depth map is very close to the real depth map. Thus, the algorithm can be applied for 2D-to-3D conversion. To evaluate the performance of our system its results are compared with existing algorithms. The subjective evaluation test was performed on proposed algorithm. The result shows that proposed system has good performance.Keyword -Depth Map, 3D Video, 2D-to-3D conversion, Monocular cue, motion cue. I. INTRODUCTION There are numbers of 3D hardware's available such as 3D TVs, Blu-Ray Players, smart phones, Kinect Camera, LASER cameras. The new 3D can be generated by using such devices. In this, two or more than two cameras are fixed at particular angles. Two views (left and right) are generated to get a stereoscopic view. Large number of companies are working in the same field such as Dynamic Digital depth (DDD), HD logicx, Himax Technologies, In-Tree, Legend 3D, Samsung, LG, Stereo D, JVC, IMAX, but not limited. However, researchers are still working on algorithm which gives superior quality of depth estimation. The depth estimation from video or image can be done either by monocular view or by binocular view. The binocular video consists of two view images or two video similar to left view and right view. Thus, the depth can be extracted from left view and right view by comparing different features in two views. The monocular view consists of single image or video. It is necessary to generate a left view and a right view from single view depending on various cues as Color, Aerial Perspective cue, Linear Perspective cue, motion, focus/defocus etc. Thus, the estimation of the depth from a monocular video/image is a difficult task. The conversion of 2D-to-3D video involves two steps: Depth estimation from a given 2D video and generation of two view left view and right view. The task of generation of two views, left view and right view is performed by algorithms known as Depth based image rendering [DIBR]. These algorithms were well understood and there exists an algorithm that produces good quality video. Therefore, there is a requirement to develop a device (e...