“…In addition to those already cited, numerous other recent papers have evaluated word embeddings by benchmarking on analogy questions (Mikolov et al, 2013b;Garten et al, 2015;Lofi et al, 2016). There is some consensus regarding performance across question types: systems do well on questions of inflectional morphology (especially so for English (Nicolai et al, 2015)), but far less reliably so for various non-geographical semantic questions-although some gains in performance are possible by adjusting the embedding algorithms used or their hyperparameters (Levy et al, 2015), or by training further on subproblems (Drozd et al, 2016).…”