The increasing alarming impacts of climate change are already apparent in viticulture, with unexpected pest outbreaks as one of the most concerning consequences. The monitoring of pests is currently done by deploying chromotropic and delta traps, which attracts insects present in the production environment, and then allows human operators to identify and count them. While the monitoring of these traps is still mostly done through visual inspection by the winegrowers, smartphone image acquisition of those traps is starting to play a key role in assessing the pests’ evolution, as well as enabling the remote monitoring by taxonomy specialists in better assessing the onset outbreaks. This paper presents a new methodology that embeds artificial intelligence into mobile devices to establish the use of hand-held image capture of insect traps for pest detection deployed in vineyards. Our methodology combines different computer vision approaches that improve several aspects of image capture quality and adequacy, namely: (i) image focus validation; (ii) shadows and reflections validation; (iii) trap type detection; (iv) trap segmentation; and (v) perspective correction. A total of 516 images were collected, divided into three different datasets and manually annotated, in order to support the development and validation of the different functionalities. By following this approach, we achieved an accuracy of 84% for focus detection, an accuracy of 80% and 96% for shadows/reflections detection (for delta and chromotropic traps, respectively), as well as mean Jaccard index of 97% for the trap’s segmentation.