Knowledge about the expected interaction duration and expected distance from which users will interact with public displays can be useful in many ways. For example, knowing upfront that a certain setup will lead to shorter interactions can nudge space owners to alter the setup. If a system can predict that incoming users will interact at a long distance for a short amount of time, it can accordingly show shorter versions of content (e.g., videos/advertisements) and employ at-a-distance interaction modalities (e.g., mid-air gestures). In this work, we propose a method to build models for predicting users' interaction duration and distance in public display environments, focusing on mid-air gestural interactive displays. First, we report our findings from a field study showing that multiple variables, such as audience size and behaviour, significantly influence interaction duration and distance. We then train predictor models using contextual data, based on the same variables. By applying our method to a mid-air gestural interactive public display deployment, we build a model that predicts interaction duration with an average error of about 8 s, and interaction distance with an average error of about 35 cm. We discuss how researchers and practitioners can use our work to build their own predictor models, and how they can use them to optimise their deployment.