Acute graft-vs-host disease (GVHD) grading systems that use only clinical symptoms at treatment initiation such as Minnesota risk identify standard and high risk categories but lack a low risk category suitable to minimize immunosuppressive strategies. We developed a new grading system that includes a low risk stratum based on clinical symptoms alone and determined whether the incorporation of biomarkers would improve the model's prognostic accuracy. We randomly divided 1863 patients in the Mount Sinai Acute GVHD International Consortium (MAGIC) who were treated for GVHD into training and validation cohorts. Patients in the training cohort were divided into 14 groups based on similarity of clinical symptoms and similar NRM; we used a classification and regression tree (CART) algorithm to create three Manhattan risk groups that produced a significantly higher area under the receiver operating characteristic curve (AUC) for 6-month NRM than the Minnesota risk classification (0.69 vs. 0.64, P=0.009) in the validation cohort. We integrated serum GVHD biomarker scores with Manhattan risk using patients with available serum samples and again used a CART algorithm to establish three MAGIC composite scores that significantly improved prediction of NRM compared to Manhattan risk (AUC, 0.76 vs. 0.70, P=0.010). Each increase in MAGIC composite score also corresponded to a significant decrease in day 28 treatment response (80% vs. 63% vs. 30%, P<0.001). We conclude that the MAGIC composite score more accurately predicts response to therapy and long term outcomes than systems based on clinical symptoms alone and may help guide clinical decisions and trial design.