“…Traditionally, gold-standard labels are annotated by manual review of patient records [37,85,116]. Labels have also been derived from registry data [33], laboratory results [61,112,117], diagnosis codes [30,57,58,118–120], and rule-based algorithms [59,121–123] to enable more rapid development of labeled datasets. The most commonly used methods for classifying a binary phenotype are random forest [26,28,35,37,56,57,60,62,70,81,84,117,119,120,124–126], logistic regression [36,37,57,58,60,67,82,84,93,116,117,119,125,127,128], and support vector machine (SVM) [31,35,37,58,60,81,82,84,92,97,104,116,125,126] (Supplementary Material Table S12 ).…”