“…Next, we compared our best baseline and Xpresso models with the reported results from existing models described in the literature, delineating five categories based upon the types of features used either as input data or as intermediate training stages: (1) those using nothing more than sequence features, which included our method and three others (Abdalla et al, 2018;Bessiè re et al, 2018;McLeay et al, 2012); (2) those using MPRAs to measure promoter activity (van Arensbergen et al, 2017;Cooper et al, 2006;Landolin et al, 2010;Nguyen et al, 2016); (3) those using the binding signal of TFs at promoter regions, as measured by ChIP (Cheng et al, 2011(Cheng et al, , 2012McLeay et al, 2012;Ouyang et al, 2009;Zhou et al, 2018); (4) those using the signal of histone marks, such as H3K4me1, H3K4me3, H3K9me3, H3K27Ac, H3K27me3, and H3K36me3 at promoters and gene bodies, as measured by ChIP (Abdalla et al, 2018;Cheng et al, 2011;Dong et al, 2012;Karli c et al, 2010;McLeay et al, 2012;Schmidt et al, 2017;Zhou et al, 2018); and (5) those using the DNase hypersensitivity signal at promoters and nearby enhancers (Dong et al, 2012;Duren et al, 2017;McLeay et al, 2012;Schmidt et al, 2017;Zhou et al, 2018) (Figure 4E). Many of these models were trained and tested on cell lines, such as K562, GM12878, and mESCs, for which ChIP data are available for a multitude of histone marks and TFs.…”