“…Reviewing these publications revealed many studies that have proposed benchmark datasets, and in a few cases benchmark tasks that meet our definition (see Section 2.1 for this definition), across a wide array of medical disciplines. For example, there have been benchmark datasets proposed for peripheral blood cell recognition (Acevedo et al, 2020), brain tumor image segmentation (Menze et al, 2014), tuberculosis identification from X-ray images (Jaeger et al, 2014), cervical cytology analysis (Zhang et al, 2019), glaucoma detection (Salam et al, 2017), ischemic stroke lesion segmentation from MRI images (Maier et al, 2017), seizure detection (Harati et al, 2014), human activity sensing and motion assessment (Kawaguchi et al, 2011;Ebert et al, 2017), voice disorder detection (Cesari et al, 2018), demographic trait detection from clinical notes (Feder et al, 2020), biomedical knowledge link prediction (Breit et al, 2020), molecular machine learning (Wu et al, 2018), ECG interpretation (Wagner et al, 2020), ICU predictions such as mortality, length of stay, patient decline, and phenotyping (Harutyunyan et al, 2019;Purushotham et al, 2018;Sheikhalishahi et al, 2020), neurodegenerative disorder diagnosis (Tagaris et al, 2018), prostate cancer survival prediction (Guinney et al, 2017), and several tasks from the UCI machine learning repository such as predicting chronic kidney disease, diabetes, breast cancer and more (Dua and Graff, 2017).…”