The use of reliability estimates is increasingly scrutinized as scholars become more aware that test–retest stability and self–other agreement provide a better approximation of the theoretical and practical usefulness of an instrument than its internal reliability. In this study, we investigate item characteristics that potentially impact single‐item internal reliability, retest reliability, and self–other agreement. Across two large samples (N = 6690 and N = 4396), two countries (Estonia and The Netherlands), and two personality inventories (the NEO PI‐3 and the HEXACO‐PI‐R), results show that (i) item variance is a strong predictor of self–other agreement and retest reliability but not of single‐item internal reliability; (ii) item variance mediates the relations between evaluativeness and self–other agreement; and (iii) self–other agreement is predicted by observability and item domain. On the whole, weak relations between item length, negations, and item position (indicating effects of questionnaire length) on the one hand, and single‐item internal reliability, retest reliability, and self–other agreement on the other, were observed. In order to increase the predictive validity of personality scales, our findings suggest that during the construction of questionnaire items, researchers are advised to pay close attention especially to item variance, but also to evaluativeness and observability. Copyright © 2016 European Association of Personality Psychology