Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs), AUC maximization and robust estimation, to mention but a few. A substantial amount of studies are devoted to studying the convergence behavior of their stochastic gradient-type algorithms. In contrast, there is relatively little work on their generalization, i.e., how the learning models built from training examples would behave on test examples. In this paper, we provide a comprehensive generalization analysis of stochastic gradient methods for minimax problems under both convex-concave and nonconvexnonconcave cases through the lens of algorithmic stability. We establish a quantitative connection between stability and several generalization measures both in expectation and with high probability. For the convex-concave setting, our stability analysis shows that stochastic gradient descent ascent attains optimal generalization bounds for both smooth and nonsmooth minimax problems. We also establish generalization bounds for both weakly-convex-weakly-concave and gradient-dominated problems.The weak PD empirical risk of (w, v) is defined asWe refer to β³ w (w, v) β β³ w S (w, v) as the weak PD generalization error of the model (w, v).Definition 2 (Strong Primal-Dual Risk). The strong PD population risk of a model (w, v) is defined asThe strong PD empirical risk of (w, v) is defined asWe refer to β³ s (w, v) β β³ s S (w, v) as the strong PD generalization error of the model (w, v).Definition 3 (Primal Risk). The primal population risk of a model w is defined as R(w) = sup vβV F (w, v). The primal empirical risk of w is defined as R S (w) = sup vβV F S (w, v). We refer to R(w) β R S (w) as the primal generalization error of the model w, and R(w) β inf w β² R(w β² ) as the excess primal population risk.According to the above definitions, we know w, v). Furthermore, we refer to F (w, v) β F S (w, v) as the plain generalization error as it is standard in SLT. A standard approach to handle a population risk is to decompose it into a generalization error (estimation error) and an empirical risk (optimization error) (Bousquet & Bottou, 2008). For example, the weak PD population risk can be decomposed as β³ w (w, v) = β³ w (w, v) β β³ w S (w, v) + β³ w S (w, v). (2.2)