Missing data is a very frequent problem in climatology, it influences on the quality of results that will afford in hydrological studies, as well as water resources management. This paper proposes a new imputation algorithm, based on the optimization of some regression methods, which are hot deck, k-nearest-neighbors imputation, weighted k-nearest-neighbors imputation, multiple imputation, linear regression and simple average method. The choice of these methods was justified by qualitative and quantitative statistical tests analysis. However, the reliability of obtained results depends mainly on percentage of missing data, choice of neighboring stations and data missingness mechanism which should be missing at random. During the study it was found that the most of stations in Soummam watershed don't have a good correlation because the large loss in rainfall data or the geology of watershed which gives a relationship between station position and rainfall variability. For this case, principal component analysis is applied on a set of stations; it showed a positive impact of altitude, latitude and longitude on correlation index between selected stations. The graphical analysis of the normal law on RMSE values, which were obtained by applying the proposed technique in several random cases of missingness, that are 4%, 8%, 12% and 16% respectively, it confirmed the validity and the performance of this approach.
The monthly precipitations obtained during 51 years of measurement in 24 stations of Soummam watershed in Algeria were analyzed to describe rainfall trends and aridity state of the area using statistical modeling. The choice of distribution laws was justified by comparing fitting results of different distributions laws used in literature reviews. Hence, the p values proved that Generalized Extreme Value, Weibull (3) and Logistic the distribution law are more adequate to analyze rainfall frequencies in different part of the watershed. The diagnostic given by Q‐Q plot, P‐P plot and survival regression curve showed the period of wetness and dryness in the northeastern and the southwestern part of the watershed, respectively. Moreover, the study given by the De Martonne index explains the consequences of climate change by a new form of aridity in the watershed between 1994 and 2018.
Recommendations for Resource Managers
The annual rainfall of Soummam watershed has a moderate and irregular rainfall distribution between 1967 and 2018.
Using distribution function on monthly rainfall in each bioclimatic floor to analyze the trend of rainfall frequency gives a spatio‐temporal description of climate in the area.
Fitting by Kolmogorov‐Smirnov test allows us to choose generalized extreme value, Weibull (3) and Logistic for modeling monthly rainfall variability in each part of the watershed.
The diagnostic obtained by P‐P plot, Q‐Q plot and survival regression curve proved a change of aridity in the northeastern and southwestern part of the watershed between 1994 and 2018.
Watershed climatic diversity poses a hard problem when it comes to finding suitable models to estimate inter-annual rainfall runoff (IARR). In this work, a hybrid model (dubbed MR-CART) is proposed, based on a combination of MR (multiple regression) and CART (classification and regression tree) machine-learning methods, applied to an IARR predicted data series obtained from a set of non-parametric and empirical water balance models in five climatic floors of northern Algeria between 1960 and 2020. A comparative analysis showed that the Yang, Sharif, and Zhang’s models were reliable for estimating input data of the hybrid model in all climatic classes. In addition, Schreiber’s model was more efficient in very humid, humid, and semi-humid areas. A set of performance and distribution statistical tests were applied to the estimated IARR data series to show the reliability and dynamicity of each model in all study areas. The results showed that our hybrid model provided the best performance and data distribution, where the R2Adj and p-values obtained in each case were between (0.793, 0.989), and (0.773, 0.939), respectively. The MR model showed good data distribution compared to the CART method, where p-values obtained by signtest and WSR test were (0.773, 0.705), and (0.326, 0.335), respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.