field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (1978) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (1982) to the present. We compare the general log-likelihood ratio (the G or G 2 statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (p~0.5), but the marginalized tests do. Tests on pair-wise frequency (F) matrices, strongly (p < 0.001) reject the most general phylogenetic (GTR) models commonly in use. It is also clear (p < 0.01) that the sequences are not stationary in their nucleotide composition.Deviations from stationarity and homogeneity seem to be unevenly distributed amongst taxa; not necessarily those expected from examining other regions of the genome. By marginalizing the 4 t patterns of the i.i.d. model to observed and expected parsimony counts, that is, from constant sites, to singletons, to parsimony informative characters of a minimum possible length, then the likelihood ratio test regains power, and it too rejects the evolutionary model with p << 0.001.Given such behavior over relatively recent evolutionary time, readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published analyses may really be far larger than the analytical methods (e.g., bootstrap) report.Keywords: Fit of sequence data to evolutionary model, base composition stationarity, placental / eutherian mammals.Waddell, Ota and Penny