“…The particular type of LMs considered in this paper can flexibly model paraphrase mapping at the word, phrase and sentence level. As LM probabilities are estimated in the paraphrased domain, they are referred to as paraphrastic language models (PLM) [16,17]. For a L word long word sequence W =< w1, w2, ..., wi, ..., wL > in the training data, rather than maximizing the surface word sequence's log-probability ln P (W) as for conventional LMs, the marginal probability over all paraphrase variant sequences is maximized,…”