In multidimensional forced-choice (MFC) questionnaires, items measuring different attributes are presented in blocks, and participants have to rank-order the items within each block (fully or partially). Such comparative formats can reduce the impact of numerous response biases often affecting single-stimulus items (aka, rating or Likert scales). However, if scored with traditional methodology, MFC instruments produce ipsative data, whereby all individuals have a common total test score. Ipsative scoring distorts individual profiles (it is impossible to achieve all high or all low scale scores), construct validity (covariances between scales must sum to zero), criterion related validity (validity coefficients must sum to zero), and reliability estimates.We argue that these problems are caused by inadequate scoring of forced-choice items, and advocate the use of item response theory (IRT) models based on an appropriate response process for comparative data, such as Thurstone's Law of Comparative Judgment. We show that by applying Thurstonian IRT modeling (Brown & Maydeu-Olivares, 2011), even existing forcedchoice questionnaires with challenging features can be scored adequately and that the IRTestimated scores are free from the problems of ipsative data. Assessments of personality, social attitudes, interests, motivation, psychopathology and well-being largely rely on respondent-reported measures. Most such measures employ the socalled single-stimulus format, where respondents evaluate one question (or item) at a time, often in relation to a rating scale (i.e. Likert-type items). Because the respondents rate each item separately from other items, they make absolute judgments about the extent to which the item describes their personality, attitudes, etc. Simple to answer and score and therefore popular with test takers and test users, the single-stimulus format makes several assumptions about the respondents' rating behaviors that are often unrealistic. For instance, the use of rating scales relies on the assumption that respondents interpret category labels in the same way. This assumption is very rarely tested in practice, but research available on the issue suggests that interpretation and meaning of response categories vary from one respondent to another (Friedman & Amoo, 1999). Furthermore, individual response styles may vary (Van Herk, Poortinga & Verhallen, 2004) so that some respondents avoid extreme categories (central tendency responding), whereas others prefer them (extreme responding). Sometimes respondents tend to agree with both positive and negative statements as presented (acquiescence bias).Another common problem is getting respondents to differentiate between ratings they give to single-stimulus items. When rating another person's attributes or behavior (as in the 360-degree feedback), respondents commonly give either high or low ratings on all behaviors (halo/horn effect) depending on whether they judge the person to score high or low on a single important dimension. Typically, respon...