Gradient Descent (GD) is used to find the local minimum value, its purpose is to find variables on the errorfunction so that a function can model the data with minimum error. Therefore, the purpose of this researchis to see how much iteration is needed and how big is the accuracy level in predicting the data when usingGradient Descent (GD) Standard and GD With Momentum and Adaptive Learning Rate (GDMALR)functions. In this study, the data to be processed using the gradient descent function is the data of SchoolParticipation Rate (SPR) in Indonesia aged 19-24 years, which began in 2011 to 2017. The reason forselection This age range is one of the factors that determine success education in a country, especiallyIndonesia. SPR is known as one of the indicators of successful development of education services in an areaof either Province, Regency or City in Indonesia. The higher the value of SPR, then the area is consideredsuccessful in providing access to education services. SPR data are taken from Indonesian Central Bureau ofStatistics. This study uses 3 models of network architecture, namely: 5-5-1, 5-15-1 and 5-25-1. From 3models, the best model is 5-5-1 with epoch 6202 iteration, 94% accuracy and MSE 0.0008658637. Thismodel is then used to predict SPR in Indonesia for the next 3 years (2018-2020). These results will beexpected to help the Indonesian government to further improve the scholarship and improve the quality ofeducation in the future