Herein, a recently developed methodology, Support Vector Machines (SVMs), is presented and applied to the challenge of soil moisture prediction. Support Vector Machines are derived from statistical learning theory and can be used to predict a quantity forward in time based on training that uses past data, hence providing a statistically sound approach to solving inverse problems. The principal strength of SVMs lies in the fact that they employ Structural Risk Minimization (SRM) instead of Empirical Risk Minimization (ERM). The SVMs formulate a quadratic optimization problem that ensures a global optimum, which makes them superior to traditional learning algorithms such as Artificial Neural Networks (ANNs). The resulting model is sparse and not characterized by the "curse of dimensionality." Soil moisture distribution and variation is helpful in predicting and understanding various hydrologic processes, including weather changes, energy and moisture fluxes, drought, irrigation scheduling, and rainfall/runoff generation. Soil moisture and meteorological data are used to generate SVM predictions for four and seven days ahead. Predictions show good agreement with actual soil moisture measurements. Results from the SVM modeling are compared with predictions obtained from ANN models and show that SVM models performed better for soil moisture forecasting than ANN models.
[1] Modeling of complex hydrologic processes has resulted in models that themselves exhibit a high degree of complexity and that require the determination of various parameters through calibration. In the current application we introduce a relatively new global optimization tool, called particle swarm optimization (PSO), that has already been applied in various other fields and has been reported to show effective and efficient performance. The PSO approach initially dealt with a single-objective function but has been extended to deal with multiobjectives in a form called multiobjective particle swarm optimization (MOPSO). The algorithm is modified to account for multiobjective problems by introducing the Pareto rank concept. The new MOPSO algorithm is tested on three case studies. Two test functions are used as the first case study to generate the true Pareto fronts. The approach is further tested for parameter estimation of a well-known conceptual rainfall-runoff model, the Sacramento soil moisture accounting model having 13 parameters, for which the results are very encouraging. We also tested the MOPSO algorithm to calibrate a three-parameter support vector machine model for soil moisture prediction.
[1] A common practice in preprocessing of data for use in hydrological modeling is to ignore observations with any missing variable values at any given time step, even if it is only one of the independent variables that is missing. In most cases, these rows of data are labeled incomplete and would not be used in either model building or subsequent model testing and verification. We argue that this is not necessarily an optimal approach for dealing with missing data because significant information could be lost when incomplete rows of data are discarded. Learning algorithms are affected by such problems more than physically based models because they rely heavily on data to learn the underlying input/output relationships of the systems being modeled. In this study, the extent of damage to the performance of learning algorithms due to missing data is explored in a field-scale application. To do so, we employed two well-known learning algorithms, namely artificial neural networks (ANNs) and support vector machines (SVMs) for short-term prediction of groundwater levels at a well field. Performance comparison is made by subjecting these algorithms to various levels of missing data. In addition to understanding the relative strengths of these algorithms in dealing with missing data, an approach for filling the data gaps in the form of an imputation methodology is proposed and tested against observed data. The utility of the current approach is further demonstrated by analyzing model runs obtained with and without imputed data. It is shown that as the percentage of missing data increases, the forecasting accuracy of ANNs is compromised more than that of SVMs. However, ANNs also derive the greater benefit from the use of imputed data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.