Environmental risk assessment (ERA) of chemicals relies on the combination of exposure and effects assessment. Exposure concentrations are commonly estimated using mechanistic fate models, but the effects side is restricted to descriptive statistical treatment of toxicity data. Mechanistic effect models are gaining interest in a regulatory context, which has also sparked discussions on model quality and good modeling practice. Proposals for good modeling practice of effect models currently focus very much on population and community models, whereas effects models also exist at the individual level, falling into the category of toxicokinetic-toxicodynamic (TKTD) models. In contrast to the higher-level models, TKTD models are usually completely parameterized by fitting them to experimental data. In fact, one of their explicit aims is to replace descriptive methods for data analysis. Furthermore, the construction of these models does not fit into an orderly modeling cycle, given that most TKTD models have been under continuous development for decades and are being applied by many different research groups, for many different purposes. These aspects have considerable consequences for the application of frameworks for model evaluation. For example, classical sensitivity analysis becomes rather meaningless when all model parameters are fitted to a data set. We illustrate these issues with the General Unified Threshold model for Survival (GUTS), relate them to the quality issues for currently used models in ERA, and provide recommendations for the evaluation of TKTD models and their analyses. Integr Environ Assess Manag 2018;14:604-614. ©2018 SETAC.