The goal of this study was to develop a framework to classify dependence in ambulation by employing a deep model in a 3D convolutional neural network (3D-CNN) using video data recorded by a smartphone during inpatient rehabilitation therapy in stroke patients. Among 2311 video clips, 1218 walk action cases were collected from 206 stroke patients receiving inpatient rehabilitation therapy (63.24 ± 14.36 years old). As ground truth, the dependence in ambulation was assessed and labeled using the functional ambulatory categories (FACs) and Berg balance scale (BBS). The dependent ambulation was defined as a FAC score less than 4 or a BBS score less than 45. We extracted patient-centered video and patient-centered pose of the target from the tracked target’s posture keypoint location information. Then, the extracted patient-centered video was input in the 3D-CNN, and the extracted patient-centered pose was used to measure swing time asymmetry. Finally, we evaluated the classification of dependence in ambulation using video data via fivefold cross-validation. When training the 3D-CNN based on FACs and BBS, the model performed with 86.3% accuracy, 87.4% precision, 94.0% recall, and 90.5% F1 score. When the 3D-CNN based on FACs and BBS was combined with swing time asymmetry, the model exhibited improved performance (88.7% accuracy, 89.1% precision, 95.7% recall, and 92.2% F1 score). The proposed framework for dependence in ambulation can be useful, as it alerts clinicians or caregivers when stroke patients with dependent ambulatory move alone without assistance. In addition, monitoring dependence in ambulation can facilitate the design of individualized rehabilitation strategies for stroke patients with impaired mobility and balance function.