Landslide Susceptibility Mapping in Guangdong Province, China, Using Random Forest Model and Considering Sample Type and Balance

Qinghai is rich in mineral resources, but frequent and large-scale mineral mining has caused secondary damage to the fragile primary surface and produced a large number of landslide disasters. In complex geological environments such as glacier ablation and frequent tectonic movements, a complete quantitative evaluation method for landslide risk in high-cold mining areas has not yet been formed. In view of this, this article uses the field survey and remote sensing data of the Datong mining area in Qinghai Province in 2012 as the basic data. We comprehensively considered five first-level factors (13 s-level factors) including topography, lithological structure, mining engineering activities, land use, and dynamic deformation as evaluation indicators for landslide susceptibility in mining areas, and used the Topographic Wetness Index (TWI) and the Human Engineering Activity Intensity (HEAI) to quantitatively estimate the hazard of landslide according to the landslide trigger mechanism. The weight-of-evidence approach was used for landslide hazard and risk mapping under different landslide--inducing conditions. The results indicate that the extremely high-hazard areas induced by human engineering activities account for 14% of the total area, and the extremely high-risk areas account for 13% of the total area in the Datong mining area, and the area of the extremely high-risk area is large; the landslide risk assessment mapping model constructed in this study can effectively evaluate the probability of slope instability caused by rainfall and human engineering activities. The effective value of the receiver operating characteristic (ROC) curve of the sensitivity assessment model reaches 0.863, and the evaluation results are consistent with reality; using the weight-of-evidence model for landslide risk assessment is more in line with the actual situation in alpine mining areas, and is more suitable for guiding landslide risk management and disaster prevention and mitigation in mining areas.

Section: Discussionmentioning

confidence: 99%

Landslide Risk Mapping Using the Weight-of-Evidence Method in the Datong Mining Area, Qinghai Province

Jiang

et al. 2023

“…Meanwhile, the vertical axis signifies the true positive rate (sensitivity), indicating the accumulating percentage of landslide samples. The AUC value reflects the probability of a randomly chosen positive sample outranking a randomly chosen negative sample, and the model's effectiveness in accurately predicting landslide occurrence or absence is evaluated based on this metric [13]. In the case of AUC > 0.5, a higher AUC value signifies a superior model fit.…”

Section: Receiver Operating Characteristicmentioning

confidence: 99%

“…They have the advantages of clear physical meaning and accurate analysis results. However, they require many geological and hydrological parameters and are only suitable for analyzing specific types of landslides on a small scale [13]. Common conditional probability models include frequency ratio (FR), information value (IV), certainty factor (CF), evidential belief function (EBF), and weights of evidence (WOE).…”

Section: Introductionmentioning

confidence: 99%

Investigation of Landslide Susceptibility Decision Mechanisms in Different Ensemble-Based Machine Learning Models with Various Types of Factor Data

Lu,

Ren,

Yue

et al. 2023

Machine learning (ML)-based methods of landslide susceptibility assessment primarily focus on two dimensions: accuracy and complexity. The complexity is not only influenced by specific model frameworks but also by the type and complexity of the modeling data. Therefore, considering the impact of factor data types on the model’s decision-making mechanism holds significant importance in assessing regional landslide characteristics and conducting landslide risk warnings given the achievement of good predictive performance for landslide susceptibility using excellent ML methods. The decision-making mechanism of landslide susceptibility models coupled with different types of factor data in machine learning methods was explained in this study by utilizing the Shapley Additive exPlanations (SHAP) method. Furthermore, a comparative analysis was carried out to examine the differential effects of diverse data types for identical factors on model predictions. The study area selected was Cenxi, Guangxi, where a geographic spatial database was constructed by combining 23 landslide conditioning factors with 214 landslide samples from the region. Initially, the factors were standardized using five conditional probability models, frequency ratio (FR), information value (IV), certainty factor (CF), evidential belief function (EBF), and weights of evidence (WOE), based on the spatial arrangement of landslides. This led to the formation of six types of factor databases using the initial data. Subsequently, two ensemble-based ML methods, random forest (RF) and XGBoost, were utilized to build models for predicting landslide susceptibility. Various evaluation metrics were employed to compare the predictive capabilities of different models and determined the optimal model. Simultaneously, the analysis was conducted using the interpretable SHAP method for intrinsic decision-making mechanisms of different ensemble-based ML models, with a specific focus on explaining and comparing the differential impacts of different types of factor data on prediction results. The results of the study illustrated that the XGBoost-CF model constructed with CF values of factors not only exhibited the best predictive accuracy and stability but also yielded more reasonable results for landslide susceptibility zoning, and was thus identified as the optimal model. The global interpretation results revealed that slope was the most crucial factor influencing landslides, and its interaction with other factors in the study area collectively contributed to landslide occurrences. The differences in the internal decision-making mechanisms of models based on different data types for the same factors primarily manifested in the extent of influence on prediction results and the dependency of factors, providing an explanation for the performance of standardized data in ML models and the reasons behind the higher predictive performance of coupled models based on conditional probability models and ML methods. Through comprehensive analysis of the local interpretation results from different models analyzing the same sample with different sample characteristics, the reasons for model prediction errors can be summarized, thereby providing a reference framework for constructing more accurate and rational landslide susceptibility models and facilitating landslide warning and management.

“…The landslide inventory reflects information such as spatial distribution, geometric size, and the attributes of landslides [30]. In this study, the landslide inventory can be divided into two categories:…”

Section: Landslide Inventorymentioning

confidence: 99%

Landslide Susceptibility Prediction Using Machine Learning Methods: A Case Study of Landslides in the Yinghu Lake Basin in Shaanxi

Ma,

Chen,

et al. 2023

Landslide susceptibility prediction (LSP) is the basis for risk management and plays an important role in social sustainability. However, the modeling process of LSP is constrained by various factors. This paper approaches the effect of landslide data integrity, machine-learning (ML) models, and non-landslide sample-selection methods on the accuracy of LSP, taking the Yinghu Lake Basin in Ankang City, Shaanxi Province, as an example. First, previous landslide inventory (totaling 46) and updated landslide inventory (totaling 46 + 176) were established through data collection, remote-sensing interpretation, and field investigation. With the slope unit as the mapping unit, twelve conditioning factors, including elevation, slope, aspect, topographic relief, elevation variation coefficient, slope structure, lithology, normalized difference vegetation index (NDVI), normalized difference built-up index (NDBI), distance to road, distance to river, and rainfall were selected. Next, the initial landslide susceptibility mapping (LSM) was obtained using the K-means algorithm, and non-landslide samples were determined using two methods: random selection and semi-supervised machine learning (SSML). Finally, the random forest (RF) and artificial neural network (ANN) machine-learning methods were used for modeling. The research results showed the following: (1) The performance of supervised machine learning (SML) (RF, ANN) is generally superior to unsupervised machine learning (USML) (K-means). Specifically, RF in the SML model has the best prediction performance, followed by ANN. (2) The selection method of non-landslide samples has a significant impact on LSP, and the accuracy of the SSML-based non-landslide selection method is controlled by the ratio of the number of landslide samples to the number of mapping units. (3) The quantity of landslides has an impact on how reliably the results of LSM are obtained because fewer landslides result in a smaller sample size for LSM, which deviates from reality. Although the results in this dataset are satisfactory, the zoning results cannot reliably anticipate the recently added landslide data discovered by the interpretation of remote-sensing data and field research. We propose that the landslide inventory can be increased by remote sensing in order to achieve accurate and impartial LSM since the LSM of adequate landslide samples is more reasonable. The research results of this paper will provide a reference basis for uncertain analysis of LSP and regional landslide risk management.