Bringing precision to the understanding and treatment of mental disorders requires instruments for studying clinically relevant individual differences. One promising approach is the development of computational assays: integrating computational models with cognitive tasks to infer latent patient-specific disease processes in brain computations. While recent years have seen many methodological advancements in computational modelling and many cross-sectional patient studies, much less attention has been paid to basic psychometric properties (reliability and construct validity) of the computational measures provided by the assays. In this review, we assess the extent of this issue by examining emerging empirical evidence. To contextualize this, we also provide a more general perspective on key developments that are needed for translating computational assays to clinical practice. Emerging evidence suggests that most computational measures show poor-to-moderate reliability and often provide little improvement over simple behavioral measures. Furthermore, behavioral and computational measures used to test computational accounts of mental disorders show a lack of convergent validity, which compromises their interpretability. Taken together, these issues pose a risk of invalidating previous findings and undermining ongoing research efforts using computational assays to study individual (and even group) differences. We suggest that cross-sectional single-task designs, which currently dominate the research landscape, are partly to blame for these problems and therefore are not suitable for solving them. Instead, reliability and construct validity need to be studied more systematically using longitudinal designs with batteries of tasks. Finally, to enable clinical applications, it will be necessary to establish predictive and longitudinal validity, and to make the assays more efficient and less burdensome.