Precision agriculture covers a wide range of information and communications technologies aimed at supporting current challenges in crop estimation, productivity increase, and food security. In particular, crop yield estimation can provide very valuable information on crop distribution, with the aim of optimising logistics and harvest timing. This paper focuses on deep learning-based regression solutions for estimating the number of visible oranges on trees, from real-world crop row videos captured by a camera placed on a farm vehicle. Count predictions based on individual frames were compared with those based on variable size sequences of frames centred on each tree (video segments). The performance of three deep neural networks designed for regression was evaluated in terms of the regression error and the uncertainty of the estimates, and differences were analysed using nonparametric hypothesis testing. Experiments were conducted on a new dataset composed of annotated video segments of orange tree rows acquired under uncontrolled conditions, which has been made publicly available. Results statistically prove the value of considering multiple frames and the feasibility of yield estimation by regression in the wild. These findings are expected to contribute to optimising decision-making in crop resource management. Unlike most efforts so far, which involve counting fruits by detection in tree images usually captured manually, this work explores counting fruits by regression on trees from real-world video data.