Given the recent controversies in some neuroimaging statistical methods, we compare the most frequently used functional Magnetic Resonance Imaging (fMRI) analysis packages: AFNI, FSL and SPM, with regard to temporal autocorrelation modeling.This process, sometimes known as pre-whitening, is conducted in virtually all task fMRI studies. We employ eleven datasets containing 980 scans corresponding to different fMRI protocols and subject populations. Though autocorrelation modeling in AFNI is not perfect, its performance is much higher than the performance of autocorrelation modeling in FSL and SPM. The residual autocorrelated noise in FSL and SPM leads to heavily confounded first level results, particularly for low-frequency experimental designs. Our results show superior performance of SPM's alternative pre-whitening: FAST, over SPM's default. The reliability of task fMRI studies would increase with more accurate autocorrelation modeling. Furthermore, reliability could increase if the packages provided diagnostic plots. This way the investigator would be aware of pre-whitening problems.