Bipolar disorder (BD) is a mood disorder involving recurring (hypo)manic and depressive episodes. The inherently temporal nature of BD has inspired its conceptualization using dynamical systems theory, which is a mathematical framework for understanding systems that evolve over time. In this paper, we provide a critical review of the dynamical systems models of BD. Owing to the heterogeneity of methodological and experimental designs in computational modeling, we designed a structured approach that parallels the appraisal of animal models by their face, predictive, and construct validity. This tool, the validity appraisal guide for computational models (VAG-CM), is not an absolute measure of validity, but rather a guide for a more objective appraisal of models in this review. We identified 26 studies published before November 18, 2021 that proposed generative dynamical systems models of time-varying signals in BD. Two raters independently applied the VAG-CM to the included studies, obtaining a mean Cohen’s κ of 0.55 (95% CI [0.45, 0.64]) prior to establishing consensus ratings. Consensus VAG-CM ratings revealed three model/study clusters: data-driven models with face validity, theory-driven models with predictive validity, and theory-driven models lacking all forms of validity. We conclude that future modeling studies should employ a hybrid approach that first operationalizes BD features of interest using empirical data to achieve face validity, followed by explanations of those features using generative models with components that are homologous to physiological or psychological systems involved in BD, to achieve construct validity. Such models would be best developed alongside long-term prospective cohort studies involving a collection of multimodal time-series data. We also encourage future studies to extend, modify, and evaluate the VAG-CM approach for a wider breadth of computational modeling studies and psychiatric disorders.