Divergent thinking (DT) ability is widely regarded as a central cognitive capacity underlying creativity, but its assessment is challenged by the fact that DT tasks yield a variable number of responses. Various approaches for the scoring of DT tasks have been proposed, which differ in how responses are evaluated and aggregated within a task. The present study aimed to identify methods that maximize psychometric quality while also reducing the confounding effect of DT fluency. We compared traditional scoring approaches (summative and average scoring) to more recent methods such as snapshot as well as top‐ and max‐scoring. We further explored the moderating role of task complexity as well as metacognitive abilities. A sample of 300 participants was recruited via Prolific. Reliability evidence was assessed in terms of internal consistency, concurrent criterion validity in terms of correlations with real‐life creative behavior, creative self‐beliefs, and openness. Findings confirm that alternative aggregation methods reduce the confounding effect of DT fluency. Reliability tends to increase as a function of the number of included responses with three responses as a minimal requirement for decent reliability evidence. Convergent validity was highest for snapshot as well as max‐scoring when using a medium number of three ideas.