This study investigates how engagement (E), foreign language enjoyment (FLE), and ambiguity tolerance (AT) exert mediation/moderation in metaverse-based foreign language learning (FLL). Featuring augment/simulation-based experiences and self-fulfillment/external-control-oriented mechanics, metaverse provides virtualized interactive circumstances involving individuals’ embodied presence/behaviors, aligning with FLL that emphasizes social interaction. Based on the quantified survey data, partial least squares structural equation modeling (PLS-SEM) analyses investigate the significance and positivity of the mediation/moderation relations. According to the results, E exerts fully positive mediation in the effect of intrinsic motivation (IM) on learning effectiveness (LE), FLE exerts partially positive mediation in the effect of classroom social climate (CSC) on E, and AT exerts negative moderation in the positive effect of E on LE. Notably, FLE exerts insignificant mediation in the effect of growth mindset (GM) on E. Therefore, efficient metaverse-based FLL requires synergies of affective factors, i.e., intrinsic motivation, perceptions of classroom social climate, moderate degrees of ambiguity tolerance, and engagement, for sustainable and long-term language learning progress in virtual interactive experiences. At the theoretical level, the findings extend the FLL-related models and advance the understanding of FLL. At the practical level, the findings provide references for more efficient metaverse implementations in FLL.