This study develops a framework to conceptualize the use and evolution of machine learning (ML) in science assessment. We systematically reviewed 47 studies that applied ML in science assessment and classified them into five categories: (a) constructed response, (b) essay, (c) simulation, (d) educational game, and (e) inter‐discipline. We compared the ML‐based and conventional science assessments and extracted 12 critical characteristics to map three variables in a three‐dimensional framework: construct, functionality, and automaticity. The 12 characteristics used to construct a profile for ML‐based science assessments for each article were further analyzed by a two‐step cluster analysis. The clusters identified for each variable were summarized into four levels to illustrate the evolution of each. We further conducted cluster analysis to identify four classes of assessment across the three variables. Based on the analysis, we conclude that ML has transformed—but not yet redefined—conventional science assessment practice in terms of fundamental purpose, the nature of the science assessment, and the relevant assessment challenges. Along with the three‐dimensional framework, we propose five anticipated trends for incorporating ML in science assessment practice for future studies: addressing developmental cognition, changing the process of educational decision making, personalized science learning, borrowing 'good' to advance 'good', and integrating knowledge from other disciplines into science assessment.