Human object detection, tracking, and recognition have applications in many areas, such as in the development of assistance robots and intelligent monitoring systems. The emergence of an RGB-D camera, namely the Kinect v2, has simplified the process of human object detection and tracking. Color space methods are dependent on lighting conditions. Because skeleton-tracking algorithms are based on depth images, they are light invariant relative to color space methods. However, skeleton information may sometimes be incorrect or become lost. An algorithm for human-target recognition is thus required. Therefore, this study proposes a human-target tracking and recognition system combining RGB images, depth images, body index, and skeleton information. The system first extracts the color information of five body parts (two upper arms, the torso, and two thighs) using color, depth, and skeleton information. The system then analyzes the color information using a mixed nine-dimensional histogram and single-color analysis method. The algorithm also includes overlap detection during the process of human-target tracking to prevent misidentification caused by occlusion. To test the proposed system, various scenarios were carefully designed to simulate the extremely complex environmental changes characteristic of the real world. Furthermore, the dynamic statistical method of event statistics was used to collect results. Experiments revealed that the proposed method is robust under varying lighting conditions and increases the success rate for individuals wearing similar clothing with monochrome colors.