Online ahead of print.)TO THE EDITOR: We read the study by Xiong et al, 1 in which the authors trained and validated a bimodal deep learning (DL) algorithm, FusionNet, to detect glaucomatous optic neuropathy (GON) from both OCT images and visual fields (VFs). It is impressive that the proposed DL algorithm achieved comparable performance to experienced glaucoma specialists and outperformed 2 single modals trained by only OCT or VF data, respectively. Nevertheless, there remain some important issues we would like the authors to clarify.First, it is essential to consider the potential application scenarios before developing any DL models. In recent years, many DL algorithms have been developed and proven to be promising in glaucoma classification for screening purposes, especially image-based ones. 2,3 In the study by Xiong et al, the proposed bimodal DL algorithm requires both OCT images and VFs for GON classification, which will potentially hinder its feasibility in a screening scenario because VF testing is relatively subjective and tedious for patients. 4 Besides, OCT and VF testing devices are not as widely available as fundus photography in lower resource areas. Therefore, regardless of the promising performance, the proposed bimodal DL algorithm may still not be feasible in glaucoma screening among general populations. Second, the authors considered training and testing the algorithm only on data from a Chinese ethnic group as a limitation and proposed further collaboration worldwide, which is worth encouraging. However, if the proposed DL algorithm is planned to be implemented in China locally, it may be reasonable to use data from Chinese ethnic groups only. Therefore, we would like the authors to further elaborate on the current barriers that prevent patients' access to glaucoma screening or assessment in China and how this proposed algorithm can potentially bridge those gaps (e.g., help to deal with the disparities and imbalanced medical resource allocation in China?), instead of just pointing out that the proposed DL model might not be suitable for population screening as the training and testing data were all from tertiary settings.Finally, the bimodal DL algorithm used paired OCT and VF as input data and offered an output as yes or no GON after combing features from 2 single models, OCTNet and VFNet, by an attention module. We are interested to know whether the proposed DL algorithm can still extract features and offer an output with only one kind of data available. In fact, it is quite common in clinics that patients cannot cooperate and complete all the examinations, especially for VFs.