Evaluation of human gait through smartphonebased pose estimation algorithms provides an attractive alternative to costly lab-bound instrumented assessment and offers a paradigm shift with real time gait capture for clinical assessment. Systems based on smart phones, such as OpenPose and BlazePose have demonstrated potential for virtual motion assessment but still lack the accuracy and repeatability standards required for clinical viability. Seq2seq architecture offers an alternative solution to conventional deep learning techniques for predicting joint kinematics during gait. This study introduces a novel enhancement to the low-powered BlazePose algorithm by incorporating a Seq2seq autoencoder deep learning model. To ensure data accuracy and reliability, synchronized motion capture involving an RGB camera and ten Vicon cameras were employed across three distinct self-selected walking speeds. This investigation presents a groundbreaking avenue for remote gait assessment, harnessing the potential of Seq2seq architectures inspired by natural language processing (NLP) to enhance pose estimation accuracy. When comparing BlazePose alone to the combination of BlazePose and 1D convolution Long Short-term Memory Network (1D-LSTM), Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM), the average mean absolute errors decreased from 13.4⁰ to 5.3⁰ for fast gait, from 16.3⁰ to 7.5⁰ for normal gait, and from 15.5⁰ to 7.5⁰ for slow gait at the left ankle joint angle respectively. The strategic utilization of synchronized data and rigorous testing methodologies further bolsters the robustness and credibility of these findings.