To describe a database of longitudinally graded telemedicine retinal images to be used as a comparator for future studies assessing grader recall bias and ability to detect typical progression (e.g. International Classification of Retinopathy of Prematurity (ICROP) stages) as well as incremental changes in retinopathy of prematurity (ROP). Cohort comprised of retinal images from 84 eyes of 42 patients who were sequentially screened for ROP over 6 consecutive weeks in a telemedicine program and then followed to vascular maturation or treatment, and then disease stabilization. De-identified retinal images across the 6 weekly exams (2520 total images) were graded by an ROP expert based on whether ROP had improved, worsened, or stayed the same compared to the prior week’s images, corresponding to an overall clinical “gestalt” score. Subsequently, we examined which parameters might have influenced the examiner’s ability to detect longitudinal change; images were graded by the same ROP expert by image view (central, inferior, nasal, superior, temporal) and by retinal components (vascular tortuosity, vascular dilation, stage, hemorrhage, vessel growth), again determining if each particular retinal component or ROP in each image view had improved, worsened, or stayed the same compared to the prior week’s images. Agreement between gestalt scores and view, component, and component by view scores was assessed using percent agreement, absolute agreement, and Cohen’s weighted kappa statistic to determine if any of the hypothesized image features correlated with the ability to predict ROP disease trajectory in patients. The central view showed substantial agreement with gestalt scores (κ = 0.63), with moderate agreement in the remaining views. Of retinal components, vascular tortuosity showed the most overall agreement with gestalt (κ = 0.42–0.61), with only slight to fair agreement for all other components. This is a well-defined ROP database graded by one expert in a real-world setting in a masked fashion that correlated with the actual (remote in time) exams and known outcomes. This provides a foundation for subsequent study of telemedicine’s ability to longitudinally assess ROP disease trajectory, as well as for potential artificial intelligence approaches to retinal image grading, in order to expand patient access to timely, accurate ROP screening.