Bias in dairy genetic evaluations, when it exists, has to be understood and properly addressed. The origin of biases is not always clear. We analyzed 40 yr of records from the Lacaune dairy sheep breeding program to evaluate the extent of bias, assess possible corrections, and emit hypotheses on its origin. The data set included 7 traits (milk yield, fat and protein contents, somatic cell score, teat angle, udder cleft, and udder depth) with records from 600,000 to 5 million depending on the trait, ~1,900,000 animals, and ~5,900 genotyped elite artificial insemination rams. For the ~8% animals with missing sire, we fit 25 unknown parent groups. We used the linear regression method to compare "partial" and "whole" predictions of young rams before and after progeny testing, with 7 cut-off points, and we obtained estimates of their bias, (over)dispersion, and accuracy in early proofs. We tried (1) several scenarios as follows: multiple or single trait, the "official" (routine) evaluation, which is a mixture of both single and multiple trait, and "deletion" of data before 1990; and (2) several models as follows: BLUP and single-step genomic (SSG)BLUP with fixed unknown parent groups or metafounders, where, for metafounders, their relationship matrix gamma was estimated using either a model for inbreeding trend, or base allele frequencies estimated by peeling. The estimate of gamma obtained by modeling the inbreeding trend resulted in an estimated increase of inbreeding, based on markers, faster than the pedigree-based one. The estimated genetic trends were similar for most models and scenarios across all traits, but were shrunken when gamma was estimated by peeling. This was due to shrinking of the estimates of metafounders in the latter case. Across scenarios, all traits showed bias, generally as an overestimate of genetic trend for milk yield and an underestimate for the other traits. As for the slope, it showed overdispersion of estimated breeding values for all traits. Using multiple-trait models slightly reduced the overestimate of genetic trend and the overdispersion, as did including genomic information (i.e., SSGBLUP) when the gamma matrix was estimated by the model for inbreeding trend. However, only deletion of historical data before 1990 resulted in elimination of both kind of biases. The SSGBLUP resulted in more accurate early proofs than BLUP for all traits. We considered that a snowball effect of small errors in each genetic evaluation, combined with selection, may have resulted in biased evaluations. Improving statistical methods reduced some bias but not all, and a simple solution for this data set was to remove historical records.