Mixed non-motorized traffic is largely unaffected by motor vehicle congestion, offering high accessibility and convenience, and thus serving as a primary mode of “last-mile” transportation in urban areas. To advance stochastic capacity estimation methods and provide reliable assessments of non-motorized roadway capacity, this study proposes a stochastic capacity estimation model based on power spectral analysis. The model treats discrete traffic flow data as a time-series signal and employs a stochastic signal parameter model to fit stochastic traffic flow patterns. Initially, UAVs and video cameras are used to capture videos of mixed non-motorized traffic flow. The video data were processed with an image detection algorithm based on the YOLO convolutional neural network and a video tracking algorithm using the DeepSORT multi-target tracking model, extracting data on traffic flow, density, speed, and rider characteristics. Then, the autocorrelation and partial autocorrelation functions of the signal are employed to distinguish among four classical stochastic signal parameter models. The model parameters are optimized by minimizing the AIC information criterion to identify the model with optimal fit. The fitted parametric models are analyzed by transforming them from the time domain to the frequency domain, and the power spectrum estimation model is then calculated. The experimental results show that the stochastic capacity model yields a pure EV capacity of 2060–3297 bikes/(h·m) and a pure bicycle capacity of 1538–2460 bikes/(h·m). The density–flow model calculates a pure EV capacity of 2349–2897 bikes/(h·m) and a pure bicycle capacity of 1753–2173 bikes/(h·m). The minimal difference between these estimates validates the effectiveness of the proposed model. These findings hold practical significance in addressing urban road congestion.