This paper presents a two-hands tracking method with a monocular camera for human machine interaction (HMI). To clarify the face of the user and his/her hands, the face is also tracked in our method. The targets are tracked independently when they are far from each other; however, they are merged with dependent likelihood measurements in higher dimension while they are likely to interrupt each other. While one target is being tracked in the independent situation, other targets are masked to decrease the skin color disturbances on the tracked one. Multiple cues, including the combination of the locally discriminative color weighted image and the back-projection image of the reference color model, the motion history image and the gradient orientation feature, are employed to verify the hypotheses originated from the particle filter. On the other hand, when the targets are closing or even overlapping, the multiple importance sampling (MIS) particle filter generates the tracking hypotheses of the merged targets by the skin blob reasoning and the depth order estimation. These joint hypotheses are then evaluated by the visual cues of occluded face template, hand shape gradient orientation, motion continuity and forearm equation. The experimental results present the real-time efficiency and the robustness in comparison with the state-of-the-art human pose estimation method.