Domain generalization remains a ubiquitous challenge for machine learning in healthcare. Model performance in real-world conditions might be lower than expected due to discrepancies between the data encountered in deployment environments and datasets used for model development. Under-representation of some groups or conditions during model development is a common cause of this phenomenon, which can have serious implications as it can exacerbate bias against groups, individuals or conditions and propagate unintended harms in their care. This challenge is often not readily addressed by targeted data acquisition and “labelling” by expert clinicians, which can be prohibitively expensive or practically impossible due to the rarity of diseases, conditions, or available clinical expertise. We hypothesize that advances in generative artificial intelligence may help mitigate this unmet need in a steerable fashion, algorithmically enriching our training dataset with synthetic examples that address shortfalls of underrepresented conditions or subgroups. We show that generative models can automatically learn realistic augmentations from data in a label-efficient manner. In particular, we leverage the higher abundance of unlabelled data to model the underlying distribution of different conditions and subgroups for an imaging modality. By conditioning generative models on appropriate labels (e.g., diagnostic labels and / or sensitive attribute labels), we can steer the distribution of synthetic examples according to specific requirements. We demonstrate that these learned augmentations make models more robust and statistically fair in- and out-of-distribution. To evaluate the generality of our approach, we study three distinct medical imaging contexts of varying difficulty: (i) histopathology images from a publicly available and widely adopted generalization benchmark, (ii) chest X-rays from publicly available clinical datasets, and (iii) dermatology images characterized by complex shifts and imaging conditions. The latter constitutes a particularly unstructured domain with various challenges. Two of these imaging modalities further require operating at a high-resolution, which requires developing faithful super-resolution techniques to recover fine details of each health condition. Complementing real training samples with synthetic ones improves the robustness of models in all three medical tasks and increases fairness by improving the accuracy of clinical diagnosis within underrepresented groups. Our proposed approach leads to stark improvements out-of-distribution across modalities: 7.7% prediction accuracy improvement in histopathology, 5.2% in chest radiology with 44.6% lower fairness gap and a striking 63.5% improvement in high-risk sensitivity for dermatology with a 7.5x reduction in fairness gap.