Exactly same objects produce dramatically different images depending on their poses to the camera and result in great ambiguity for spatial recognition. Different poses of same objects also lead to different orientations in the use of intrinsic system. Our current study is focusing on this issue and can be divided into three phases. First, we propose an object pose-estimation model which is capable of recognizing unseen views. We achieve this goal by building a discrete key-pose structure parameterized by an azimuth and using PHOG [20] descriptor to measure the shape correspondence between two images. A large number of instances are learned at the training stage through semi-supervised. Then, we show experimental results on our own dataset. Second, according to the analyzed criteria in the use of intrinsic system, we recognize the frontal orientation of an intrinsic geometry object combining with pose-estimation results (e.g., a LCD screen). Finally, we summarize our integrated model which is able to classify object category, estimate object pose, distinguish intrinsic spatial relations between reference and target objects and locate the target under users' instructions.