The evolution of technologies for the capture of human movement has been motivated by a number of potential applications across a wide variety of fields. However, capturing human motion in 3D is difficult in an outdoor environment when it is performed without controlled surroundings. In this paper, a stereo camera rig with an ultra-wide baseline distance and conventional cameras with fish-eye lenses is proposed. Its cameras provide a wide field of view (FOV) which increases the coverage area and also enables the baseline distance to be increased to cover the common area required for both cameras' views to perform as a stereo camera. We propose a passive marker-based approach to track the motion of the object. In this method, an adaptive thresholding method is applied to extract each small pink polyester marker from the video frames. As the cameras have fish-eye lenses, it is difficult to estimate the depth information using a pinhole camera model. We use a unique method to restore the 3D positions by developing a relationship between the pixel dimensions and distances in an image and real world coordinates. In this paper, occlusion detection is considered because, in the marker-based capturing of articulated human kinematics, the occlusion of a marker is one of the major challenges. The detection algorithm differentiates among types of occlusions and predicts any missing marker position where necessary. As this design is intended to be mounted on a moving carrier, such as a drone or car, a method for compensating the camera's ego-motion is proposed. The proposed 3D positioning and tracking system is tested in different situations to validate its applicability as a stereo camera rig as well as its performance for motion capture. The performance of the proposed system is compared with that of a standard motion capture system called Vicon and is shown to have the same order of accuracy while incurring less cost. INDEX TERMS Motion capture, 3D positioning, stereo vision, motion tracking, high precision.