Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.
Soil erosion is a global problem that will become worse as a result of climate change. While many parts of the world are speculating about the effect of increased rainfall intensity and frequency on soil erosion, Taiwan’s mountainous areas are already facing the power of rainfall erosivity more than six times the global average. To improve the modeling ability of extreme rainfall conditions on highly rugged terrains, we use two analysis units to simulate soil erosion at the Shihmen reservoir watershed in northern Taiwan. The first one is the grid cell method, which divides the study area into 10 m by 10 m grid cells. The second one is the slope unit method, which divides the study area using natural breaks in landform. We compared the modeling results with field measurements of erosion pins. To our surprise, the grid cell method is much more accurate in predicting soil erosion than the slope unit method, although the slope unit method resembles the real terrains much better than the grid cell method. The average erosion pin measurement is 6.5 mm in the Shihmen reservoir watershed, which is equivalent to 90.6 t ha−1 yr−1 of soil erosion.
Shihmen Reservoir watershed is vital to the water supply in Northern Taiwan but the reservoir has been heavily impacted by sedimentation and soil erosion since 1964. The purpose of this study was to explore the capability of machine learning algorithms, such as decision tree and random forest, to predict soil erosion (sheet and rill erosion) depths in the Shihmen reservoir watershed. The accuracy of the models was evaluated using the RMSE (Root Mean Squared Error), MAE (Mean Absolute Error), and R2. Moreover, the models were verified against the multiple regression analysis, which is commonly used in statistical analysis. The predictors of these models were 14 environmental factors which influence soil erosion, whereas the target was 550 erosion pins installed at 55 locations (on 55 slopes) and monitored over a period of approximately three years. The data sets for the models were separated into 70% for the training data and 30% for the testing data, using the simple random sampling and stratified random sampling methods. The results show that the random forest algorithm performed the best of the three methods. Moreover, the stratified random sampling method had better results among the two sampling methods, as anticipated. The average error (RMSE relative to 1:1 line) of the stratified random sampling method of the random forest algorithm is 0.93 mm/yr in the training data and 1.75 mm/yr in the testing data, respectively. Finally, the random forest algorithm predicted that type of slope, slope direction, and sub-watershed are the three most important factors of the 14 environmental factors collected and used in this study for splits in the trees and thus they are the three most important factors affecting the depth of sheet and rill erosion in the Shihmen Reservoir watershed. The results of this study can be employed by decision-makers to improve soil conservation planning and watershed remediation.
The estimation of soil erosion in Taiwan and many countries of the world is based on the widely used universal soil loss equation (USLE), which includes the factor of soil erodibility (K-factor). In Taiwan, K-factor values are referenced from past research compiled in the Taiwan Soil and Water Conservation Manual, but there is limited data for the downstream area of the Shihmen reservoir watershed. The designated K-factor from the manual cannot be directly applied to large-scale regional levels and also cannot distinguish and clarify the difference of soil erosion between small field plots or subdivisions. In view of the above, this study establishes additional values of K-factor by utilizing the double rings infiltration test and measures of soil physical–chemical properties and increases the spatial resolution of K-factor map for Shihmen reservoir watershed. Furthermore, the established values of K-factors were validated with the designated value set at Fuxing Sanmin from the manual for verifying the correctness of estimates. It is found that the comparative results agree well with established estimates within an allowable error range. Thus, the K-factors established by this study update the previous K-factor system and can be spatially estimated for any area of interest within the Shihmen reservoir watershed and improving upon past limitations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.