To the Editor Choby and colleagues 1 conducted an extensive study to investigate if incorporation of Hyams grade into olfactory neuroblastoma staging systems would provide better prediction of clinical outcomes: overall survival (OS), diseasefree survival (DFS), and disease-specific survival. They used C statistics as a metric to evaluate each working Cox model's predictability. There are several lessons we may learn from this study for future studies under a similar setting.First, the authors 1 used the Harrell C statistic as the metric, which is known to be problematic. This C statistic estimates a population quantity that involves the censoring distribution of the study patients. Thus, it does not estimate the concordance rate between the patient's Cox risk score and the outcome variable (eg, survival time) as the authors intended to do, and the conclusion based on this statistic may be questionable. A valid alternative was proposed by Uno et al, 2 which is commonly used in practice these days.Second, for the OS and DFS outcomes, the bestperforming models resulted in C statistics of 0.66 and 0.70, respectively, which are modest results (the C statistic ranges from 0.50 to 1, higher is better). Thus, more research is needed to identify other relevant variables to improve the model's performance.Third, a C statistic quantifies a model's discriminative, not predictive, ability. A discriminative model simply identifies which patients would have events earlier and later. Thus, the C statistic is a rather crude measure for evaluating the model's goodness of fit. For a model with good predictability, one would expect the predicted outcome to be very close to the actual event time. An intuitive measure for assessing the predictability would be the average absolute difference between the observed and predicted outcomes across all study patients. [3][4][5] This metric is clinically interpretable and more relevant than using a measure of discrimination.Finally, the study 1 used a single data set with a modest sample size to build various working models and then as-sessed their relative merits. To avoid overfitting or being overly optimistic, it is customary to use 2 independent data sets, one set for training to derive candidate models and the other set for estimating the predictability of each model for model selection. If there is only 1 data set available, we may use the standard cross validation procedure to reduce the overfitting problem.