This study explores the convergence of user experience (UX) and machine learning, particularly employing computer vision techniques to preprocess audiovisual data to detect user interface (UI) elements. With an emphasis on usability testing, the study introduces a novel approach for recognizing changes in UI screens within video recordings. The methodology involves a sequence of steps, including form prototype creation, laboratory experiments, data analysis, and computer vision tasks. The future aim is to automate the evaluation of user behavior during UX testing. This innovative approach is relevant to the agricultural domain, where specialized applications for precision agriculture, subsidy requests, and production reporting demand streamlined usability. The research introduces a frame extraction algorithm that identifies screen changes by analyzing pixel differences between consecutive frames. Additionally, the study employs YOLOv7, an efficient object detection model, to identify UI elements within the video frames. Results showcase successful screen change detection with minimal false negatives and acceptable false positives, showcasing the potential for enhanced automation in UX testing. The study’s implications lie in simplifying analysis processes, enhancing insights for design decisions, and fostering user-centric advancements in diverse sectors, including precision agriculture.