Freezing of gait (FOG) is a poorly understood heterogeneous gait disorder seen in patients with parkinsonism which contributes to significant morbidity and social isolation. FOG is currently measured with scales that are typically performed by movement disorders specialists (i.e., MDS-UPDRS), or through patient completed questionnaires (N-FOG-Q) both of which are inadequate in addressing the heterogeneous nature of the disorder and are unsuitable for use in clinical trials The purpose of this study was to devise a method to measure FOG objectively, hence improving our ability to identify it and accurately evaluate new therapies. A major innovation of our study is that it is the first study of its kind that uses the largest sample size (>30 h, N = 57) in order to apply explainable, multi-task deep learning models for quantifying FOG over the course of the medication cycle and at varying levels of parkinsonism severity. We trained interpretable deep learning models with multi-task learning to simultaneously score FOG (cross-validated F1 score 97.6%), identify medication state (OFF vs. ON levodopa; cross-validated F1 score 96.8%), and measure total PD severity (MDS-UPDRS-III score prediction error ≤ 2.7 points) using kinematic data of a well-characterized sample of N = 57 patients during levodopa challenge tests. The proposed model was able to explain how kinematic movements are associated with each FOG severity level that were highly consistent with the features, in which movement disorders specialists are trained to identify as characteristics of freezing. Overall, we demonstrate that deep learning models’ capability to capture complex movement patterns in kinematic data can automatically and objectively score FOG with high accuracy. These models have the potential to discover novel kinematic biomarkers for FOG that can be used for hypothesis generation and potentially as clinical trial outcome measures.