Recent advancements in remote sensing and artificial intelligence can potentially revolutionize the automated detection of archaeological sites. However, the challenging task of interpreting remote sensing imagery combined with the intricate shapes of archaeological sites can hinder the performance of computer vision systems. This work presents a computer vision system trained for efficient hillfort detection in remote sensing imagery. Equipped with an adapted multimodal semantic segmentation model, the system integrates LiDAR‐derived LRM images and aerial orthoimages for feature fusion, generating a binary mask pinpointing detected hillforts. Post‐processing includes margin and area filters to remove edge inferences and smaller anomalies. The resulting inferences are subjected to hard positive and negative mining, where expert archaeologists classify them to populate the training data with new samples for retraining the segmentation model. As the computer vision system is far more likely to encounter background images during its search, the training data are intentionally biased towards negative examples. This approach aims to reduce the number of false positives, typically seen when applying machine learning solutions to remote sensing imagery. Northwest Iberia experiments witnessed a drastic reduction in false positives, from 5678 to 40 after a single hard positive and negative mining iteration, yielding a 99.3% reduction, with a resulting F1 score of 66%. In England experiments, the system achieved a 59% F1 score when fine‐tuned and deployed countrywide. Its scalability to diverse archaeological sites is demonstrated by successfully detecting hillforts and other types of enclosures despite their typical complex and varied shapes. Future work will explore archaeological predictive modelling to identify regions with higher archaeological potential to focus the search, addressing processing time challenges.