As the demand for human-friendly computing increases, research on pupil tracking to facilitate human–machine interactions (HCIs) is being actively conducted. Several successful pupil tracking approaches have been developed using images and a deep neural network (DNN). However, common DNN-based methods not only require tremendous computing power and energy consumption for learning and prediction; they also have a demerit in that an interpretation is impossible because a black-box model with an unknown prediction process is applied. In this study, we propose a lightweight pupil tracking algorithm for on-device machine learning (ML) using a fast and accurate cascade deep regression forest (RF) instead of a DNN. Pupil estimation is applied in a coarse-to-fine manner in a layer-by-layer RF structure, and each RF is simplified using the proposed rule distillation algorithm for removing unimportant rules constituting the RF. The goal of the proposed algorithm is to produce a more transparent and adoptable model for application to on-device ML systems, while maintaining a precise pupil tracking performance. Our proposed method experimentally achieves an outstanding speed, a reduction in the number of parameters, and a better pupil tracking performance compared to several other state-of-the-art methods using only a CPU.