Glaucoma, which makes progressive and irreversible sight damage to human eyes, is the second leading cause of blindness worldwide. The damage is principally estimated by visual field (VF) sensitivity through costly visual field tests. To achieve a less costly estimation, a promising method is to first measure retinal layers thickness (RT) by optical coherence tomography and then map RT into VF. There are some recent studies showing that the mapping can be effectively learned by convolutional neural networks (CNNs). However, different stages of glaucoma, e.g., early stage and severe stage, that result in different relations between VF and RT, have not yet been distinguished in the learning. Consequently, the learned mapping may be a poor fit for each of these stages especially when training data are small-scale. In this paper we propose for the first time an approach that distinguishes two different glaucoma stages in learning the mapping. In particular, our approach is two-phase, the first of which is to estimate the probability of an RT being a particular stage and the second is to learn stage-specific mappings. For the probability estimation, we employ a CNN based classification model. We design a set of data augmentation techniques for training as well as a test-time augmentation technique for testing by referring to the domain knowledge of the data. To learn stage-specific mappings, we design a regression model with a novel two-branch architecture. Moreover, we introduce a novel attention mechanism to CNN for both the classification and the regression. Through experiments on a real-world dataset, we demonstrate that our proposed method outperforms existing methods by 6.82% in terms of mean of the root mean square error for early stage of glaucoma.