Purpose
To investigate and implement semiautomated screening for meta-analyses (MA) in urology under consideration of class imbalance.
Methods
Machine learning algorithms were trained on data from three MA with detailed information of the screening process. Different methods to account for class imbalance (Sampling (up- and downsampling, weighting and cost-sensitive learning), thresholding) were implemented in different machine learning (ML) algorithms (Random Forest, Logistic Regression with Elastic Net Regularization, Support Vector Machines). Models were optimized for sensitivity. Besides metrics such as specificity, receiver operating curves, total missed studies, and work saved over sampling were calculated.
Results
During training, models trained after downsampling achieved the best results consistently among all algorithms. Computing time ranged between 251 and 5834 s. However, when evaluated on the final test data set, the weighting approach performed best. In addition, thresholding helped to improve results as compared to the standard of 0.5. However, due to heterogeneity of results no clear recommendation can be made for a universal sample size. Misses of relevant studies were 0 for the optimized models except for one review.
Conclusion
It will be necessary to design a holistic methodology that implements the presented methods in a practical manner, but also takes into account other algorithms and the most sophisticated methods for text preprocessing. In addition, the different methods of a cost-sensitive learning approach can be the subject of further investigations.