Background Inferring response to antiretroviral therapy from the viral genotype alone is challenging. The utility of an intermediate step of predicting in vitro drug susceptibility is currently controversial. Here, we provide a retrospective comparison of approaches using either genotype or predicted phenotypes alone, or in combination. Methods Treatment change episodes were extracted from two large databases from the USA (Stanford-California) and Europe (EuResistDB) comprising data from 6,706 and 13,811 patients, respectively. Response to antiretroviral treatment was dichotomized according to two definitions. Using the viral sequence and the treatment regimen as input, three expert algorithms (ANRS, Rega and HIVdb) were used to generate genotype-based encodings and VircoTYPE™ 4.0 (Virco BVBA, Mechelen, Belgium) was used to generate a predicted phenotype-based encoding. Single drug classifications were combined into a treatment score via simple summation and statistical learning using random forests. Classification performance was studied on Stanford- California data using cross-validation and, in addition, on the independent EuResistDB data. Results In all experiments, predicted phenotype was among the most sensitive approaches. Combining single drug classifications by statistical learning was significantly superior to unweighted summation ( P<2.2x10-16). Classification performance could be increased further by combining predicted phenotypes and expert encodings but not by combinations of expert encodings alone. These results were confirmed on an independent test set comprising data solely from EuResistDB. Conclusions This study demonstrates consistent performance advantages in utilizing predicted phenotype in most scenarios over methods based on genotype alone in inferring virological response. Moreover, all approaches under study benefit significantly from statistical learning for merging single drug classifications into treatment scores.