Screening references is a time‐consuming step necessary for systematic reviews and guideline development. Previous studies have shown that human effort can be reduced by using machine learning software to prioritise large reference collections such that most of the relevant references are identified before screening is completed. We describe and evaluate RobotAnalyst, a Web‐based software system that combines text‐mining and machine learning algorithms for organising references by their content and actively prioritising them based on a relevancy classification model trained and updated throughout the process. We report an evaluation over 22 reference collections (most are related to public health topics) screened using RobotAnalyst with a total of 43 610 abstract‐level decisions. The number of references that needed to be screened to identify 95% of the abstract‐level inclusions for the evidence review was reduced on 19 of the 22 collections. Significant gains over random sampling were achieved for all reviews conducted with active prioritisation, as compared with only two of five when prioritisation was not used. RobotAnalyst's descriptive clustering and topic modelling functionalities were also evaluated by public health analysts. Descriptive clustering provided more coherent organisation than topic modelling, and the content of the clusters was apparent to the users across a varying number of clusters. This is the first large‐scale study using technology‐assisted screening to perform new reviews, and the positive results provide empirical evidence that RobotAnalyst can accelerate the identification of relevant studies. The results also highlight the issue of user complacency and the need for a stopping criterion to realise the work savings.