Artificial intelligence (AI) continues to advance rapidly and is being applied broadly in our everyday lives, as well as in ophthalmology. In parallel with growing awareness of these technologies is increasingly nuanced knowledge regarding potential limitations and considerations for clinical implementation. In this issue of JAMA Ophthalmology, Paul et al 1 describe the development of an AI algorithm for estimating best-corrected visual acuity (BCVA) from color fundus photographs among eyes with diabetic macular edema (DME) using data from the VISTA clinical trial that compared efficacy and safety of intravitreal aflibercept injection vs macular laser photocoagulation for DME over 3 years. The rationale is to potentially decrease the need for manual refraction and visual acuity measurement that can often be time-consuming and effort intensive. This study illustrates several important considerations for future clinical implementation of AI, many of which are broadly applicable beyond this specific use case.First, not all algorithms may perform well for the desired application. While the algorithm in this study did demonstrate some ability to make reasonable BCVA estimates, its performance was relatively modest overall. The best-performing algorithm's mean absolute error, which compares the AIpredicted BCVA with the actual protocol-measured true BCVA, was 9.66 Early Treatment Diabetic Retinopathy Study (ETDRS) letters (95% CI, 9.05-10.28). Furthermore, 40% of AI-predicted BCVA were more than 10 letters different than the protocol-measured BCVA. It has been shown that visual acuity measurements with habitual correction on ETDRS charts may be 1 to 2 lines (5 to 10 letters) worse than that obtained by a protocol refraction BCVA. 2 Given the modest performance of this AI algorithm in predicting BCVA and the existing variability in measurement of BCVA, the role of this AI algorithm needs further validation. While adequate technical performance comprises a baseline threshold, subsequent evaluations need to compare AI algorithms against routine clinical practice to ensure that sufficient marginal benefit exists to justify the investment of resources required to translate the algorithms into practice.Second, in addition to considering overall training sample size, ensuring representation of a range of disease severity is important. The AI algorithms described in this study tended to perform worse among patients with poorer visual acuities. The authors 1 hypothesized that this may have been due to a relatively small number of eyes with poorer visual acuities represented in the training data. AI algorithms, especially deeplearning algorithms like the ones used in this particular study, are data hungry and often require large amounts of training data to achieve high performance. The authors also engaged in a particularly interesting subanalysis examining outlier cases