Systematic reviews are crucial yet time-consuming and labor-intensive, managing large numbers ofstudies. Active learning techniques can improve the efficiency of screening and prioritize the most likely relevant studies. The performance of these techniques can only be evaluated using fully labeled datasets, which are not always available. The main goal of the current paper was to create such adataset by reconstructing a fully labeled dataset based on the search queries, number of results foreach query, list of included papers, and number of initially screened articles. A systematic review of the treatment of Borderline Personality Disorder, correctly following the PRISMA guidelines for reporting systematic reviews, was our case study. The reconstructed dataset (k=1053) did not exactly match the initial dataset (k=1013), due to mismatching in the closed-source search tools, retracted papers, and other reasons outside the influence of the original authors. Consequently, although the reconstructed dataset contained all initially relevant records, we could not simply label all otherrecords as irrelevant; within the label noise, additional relevant records could be present. Therefore, we developed a noisy label filter (NLF) procedure to deal with unknown labels. After applying the NLFprocedure, we used the reconstructed dataset for a simulation study using the open-source softwareASReview. On average, 77.36% of screening time could have been saved, and Naïve Bayes had the best model fit (work saved over sampling = 82.30%). In the discussion section, we providerecommendations and a decision tree for reconstructing datasets.