“…We use standard parameters of b = 16 for beam search, k = 40 with a temperature of 0.7 for top-k sampling, and p = 0.95 for nucleus sampling (Holtzman et al, 2019). In terms of bias metrics, we use existing NLG bias metrics: regard ratio (Sheng et al, 2019), sentiment ratio (Groenwold et al, 2020), individual and group fairness through sentiment (IF/GF) (Huang et al, 2020), and a gendered word co-occurrence scores (Bordia and Bowman, 2019). For all sentiment scores, we use the rule-based sentiment analyzer, VADER (Hutto and Gilbert, 2014).…”