Stereo vision systems that capture 3D space data are of great use as computer interface systems for emerging technologies such as VR and Robotics. They are inexpensive, yet practical implementation is challenged by high CPU processing power requirement. Thus, in this paper a scheme is developed to exploit GPUs in the processing of high definition images required to track an object position and orientation in 3D space in real-time. This is done by porting and optimizing essential processing algorithms to operate efficiently in parallel on a GPU. The object detected is a pen, with the position of the tip and it's orientation tracked in real-time. The writing captured by the system is than compared to the one from commercial digitizing tablet. The results achieved indicated that pen tip can be tracked in 3D space with a percentage error below 1.7% within a distance of 40cm from the stereo camera shooting at a frame rate of 30 frames per second.