“…The baseline schedule starts from the initial learning rate, then it decreases by the discount factor every discount steps. In the baseline experiments, we test all combinations from the initial learning rate in [0.1, 0.01, 0.001, 0.0001], the discount step in [10,20,50,100], and the discount factor in [0.99, 0.9, 0.88]. After choosing the best baseline schedule, we run it 10 times with the same set of hyper-parameters and report mean and standard deviation of test loss and accuracy.…”