Indoor human action recognition, essential across various applications, faces significant challenges such as orientation constraints and identification limitations, particularly in systems reliant on non-contact devices. Self-occlusions and non-line of sight (NLOS) situations are important representatives among them. To address these challenges, this paper presents a novel system utilizing dual Kinect V2, enhanced by an advanced Transmission Control Protocol (TCP) and sophisticated ensemble learning techniques, tailor-made to handle self-occlusions and NLOS situations. Our main works are as follows: (1) a data-adaptive adjustment mechanism, anchored on localization outcomes, to mitigate self-occlusion in dynamic orientations; (2) the adoption of sophisticated ensemble learning techniques, including a Chirp acoustic signal identification method, based on an optimized fuzzy c-means-AdaBoost algorithm, for improving positioning accuracy in NLOS contexts; and (3) an amalgamation of the Random Forest model and bat algorithm, providing innovative action identification strategies for intricate scenarios. We conduct extensive experiments, and our results show that the proposed system augments human action recognition precision by a substantial 30.25%, surpassing the benchmarks set by current state-of-the-art works.