INTRODUCTION:Recently, low cost RGB-D depth sensors have emerged as a promising alternative for motion capture for clinical gait assessment. However, depth sensor based Mocap (D-Mocap) suffers from low accuracy and poor stability for 3D joint estimation due to noise, self-occlusion, interference, and other technical limitations, which prevents it from being widely used in health-related applications. OBJECTIVES: The primary objective of this study is to integrate non-linear Kalman filters (KFs) and evolutionary algorithms to enhance the quality of D-Mocap data by jointly considering kinematic or anthropometric constraints at the joint-level and skeleton-level, respectively. METHODS: We propose a hybrid approach to synergistically integrate the Tobit Kalman filter (TKF) and the Differential Evolutionary (DE) algorithm for human motion enhancement that is referred to as TKF-DE. Specifically, the joint-level TKF provides the predictive distribution of each joint that is kinematically admissible in time and probabilistically amenable in space for skeleton-level DE optimization in terms of all bone lengths. Two predictive distributions of the TKF, i.e., the Gaussian and uniform, are tested and compared in terms of their effectiveness in generating the initial DE population. RESULTS: Two sets of motion capture data are used to validate the proposed TKF-DE methods, one simulated and one real-world. The first dataset is from the Carnegie Mellon University Database (CMU) which contains a multitude of various motions and is simplified to a 21-joint skeleton and corrupted with additive white Gaussian noise (AWGN). The second dataset was collected at two labs at Oklahoma State University (OSU). The Orbbec depth sensor and the Nuitrack SDK were used for D-Mocap data acquisition along with an optical Mocap system that was time-synchronized and skeleton-matched with the D-Mocap system as a reference for evaluation. The results confirm that the proposed TKF-DE algorithms significantly outperform other nonlinear KFs, including the extended KF (EKF), the unscented KF (UKF), and the TKF alone, with improved accuracy and stability of the estimation of joint positions, bone lengths, and joint angles. It is also shown that the Gaussian-based predictive distribution is better than the uniform one, further validating the efficacy and synergy of the two key components in the TKF-DE algorithm. CONCLUSION: Our research synergistically integrates the TKF and DE algorithms in one framework referred to as the TKF-DE, that takes advantage of kinematic and anthropometric constraints for human motion enhancement. The experimental results on two D-Mocap datasets show that the proposed TKF-DE method significantly improves the quality of D-Mocap data in terms of joint positions, bone lengths, and joint angles. This study takes one step closer to bringing the D-Mocap technology to possible health-related applications.