In this work, we compare the results of molecular dynamics simulations involving the application of three generalized Born (GB) models to 10 different proteins. The three GB models, the Still, HCT, and modified analytical generalized Born models, were implemented in the computationally efficient GROMACS package. The performance of each model was assessed from the backbone rms deviation from the native structure, the number of native hydrogen bonds retained in the simulation, and the experimental and calculated radius of gyration. Analysis of variance (ANOVA) was used to analyze the results of the simulations. The rms deviation measure was found to be unable to distinguish the quality of the results obtained with the three different GB models, whereas the number of native hydrogen bonds and radius of gyration yielded a statistically meaningful discrimination among models. Our results suggest that, of the three, modified analytical generalized Born yields the best agreement between calculated and experimentally derived structures. More generally, our study highlights the need both to evaluate the effects of different variables on the results of simulations and to verify that the results of molecular dynamics simulations are statistically meaningful.analysis of variance ͉ molecular dynamics M olecular dynamics (MD) simulations are widely used in the study of protein structure and functions (1, 2). The results of a given simulation depend on a number of factors, such as the quality of the molecular force field, the treatment of solvent, the timescale of the simulation, and the sampling efficiency of the simulation protocol. There has been an enormous investment in the underlying technology in each of these areas, and the range of application of MD simulations has expanded greatly since the method was first introduced (3). However, by their very nature, MD simulations are not easy to compare to experiment. Simulations of protein systems involve a considerable amount of computer time, and the results in general cannot be directly compared with the experimental observables without additional processing. When such comparisons are made, the results are often very encouraging, but, given the multiplicity of parameters and computational methodologies that are used, it is hard to know whether the success of a particular protocol, e.g., using one of the available force fields, indicates that all other force fields will yield results of comparable quality. Fortunately, the growth in available computer power combined with the development of highly optimized computer code has made careful comparison of methods more feasible and, in addition, should make it increasingly possible to test whether a particular result is robust in terms of its sensitivity to the parameters of the calculation. Moreover, it would be highly desirable to know whether a particular methodology or combination of parameters is best suited to a particular application and to understand the reasons for performance differences, to the extent that they exist. Th...