“…In addition, we also compare our model with those in previous studies, including conventional image captioning models, e.g., ST (Vinyals et al, 2015), ATT2IN (Rennie et al, 2017), ADAATT (Lu et al, 2017), TOPDOWN (Anderson et al, 2018), and the ones proposed for the medical domain, e.g., COATT , HRGR and CMAS-RL (Jing et al, 2019).…”