Research into neuroimaging biomarkers for Late Life Depression (LLD) has identified neural correlates of LLD including increased white matter hyperintensities and reduced hippocampal volume. However, studies into neuroimaging biomarkers for LLD largely fail to converge. This lack of replicability is potentially due to challenges linked to construct variability, etiological heterogeneity, and experimental rigor. We discuss suggestions to help address these challenges, including improved construct standardization, increased sample sizes, multimodal approaches to parse heterogeneity, and the use of individualized analytical models.