Optimizing Ensemble Weights for Machine Learning Models: A Case Study for Housing Price Prediction

Shahhosseini, Mohsen; Pham, Hieu

doi:10.1007/978-3-030-30967-1_9

Cited by 26 publications

(36 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Based on bias and variance tradeoff, the objective function of the optimization problem can be mean squared error (MSE) of out-of-bag predictions for the ensemble ( Hastie et al., 2009 ). The out-of-bag predictions matrix created previously can be used as an emulator of unseen test observations ( Shahhosseini et al., 2019b ). Using the out-of-bag predictions, we propose an optimization problem which is a nonlinear convex optimization problem as follows.…”

Section: Methodsmentioning

confidence: 99%

“…Studies show that a single machine learning model can be outperformed by a “committee” of individual models, which is called a machine learning ensemble ( Zhang and Ma, 2012 ). Ensemble learning is proved to be effective as it can reduce bias, variance, or both and is able to better capture the underlying distribution of the data in order to make better predictions, if the base learners are diverse enough ( Dietterich, 2000 ; Pham and Olafsson, 2019a ; Pham and Olafsson, 2019b ; Shahhosseini et al., 2019a ; Shahhosseini et al., 2019b ). The usage of ensemble learning in ecological problems is becoming more widespread; for instance, bagging and specifically random forest ( Vincenzi et al., 2011 ; Mutanga et al., 2012 ; Fukuda et al., 2013 ; Jeong et al., 2016 ), boosting ( De'ath, 2007 ; Heremans et al., 2015 ; Belayneh et al., 2016 ; Stas et al., 2016 ; Sajedi-Hosseini et al., 2018 ), and stacking ( Conţiu and Groza, 2016 ; Cai et al., 2017 ; Shahhosseini et al., 2019a ), are some of the ensemble learning applications in agriculture.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Forecasting Corn Yield With Machine Learning Ensembles

2020

Self Cite

View full text Add to dashboard Cite

The emergence of new technologies to synthesize and analyze big data with highperformance computing has increased our capacity to more accurately predict crop yields. Recent research has shown that machine learning (ML) can provide reasonable predictions faster and with higher flexibility compared to simulation crop modeling. However, a single machine learning model can be outperformed by a "committee" of models (machine learning ensembles) that can reduce prediction bias, variance, or both and is able to better capture the underlying distribution of the data. Yet, there are many aspects to be investigated with regard to prediction accuracy, time of the prediction, and scale. The earlier the prediction during the growing season the better, but this has not been thoroughly investigated as previous studies considered all data available to predict yields. This paper provides a machine leaning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa) considering complete and partial inseason weather knowledge. Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions. The forecasts are made in county-level scale and aggregated for agricultural district and state level scales. Results show that the proposed optimized weighted ensemble and the average ensemble are the most precise models with RRMSE of 9.5%. Stacked LASSO makes the least biased predictions (MBE of 53 kg/ha), while other ensemble models also outperformed the base learners in terms of bias. On the contrary, although random k-fold cross-validation is replaced by blocked sequential procedure, it is shown that stacked ensembles perform not as good as weighted ensemble models for time series data sets as they require the data to be non-IID to perform favorably. Comparing our proposed model forecasts with the literature demonstrates the acceptable performance of forecasts made by our proposed ensemble model. Results from the scenario of having partial in-season weather knowledge reveals that decent yield forecasts with RRMSE of 9.2% can be made as early as June 1 st. Moreover, it was shown that the proposed model performed better than individual models and benchmark ensembles at agricultural district and statelevel scales as well as county-level scale. To find the marginal effect of each input feature on the forecasts made by the proposed ensemble model, a methodology is suggested that is the basis for finding feature importance for the ensemble model. The findings

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Forecasting Corn Yield With Machine Learning Ensembles

2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…Moreover, an average weighted ensemble that assigns equal weights to all base learners is the simplest ensemble model created. Additionally, optimized weighted ensemble method proposed in Shahhosseini et al 60 was applied here to test its predictive performance. Several two-level stacking ensembles, namely stacked regression, stacked LASSO, stacked random forest, and stacked LightGBM, were built, which are expected to demonstrate excellent performance.…”

Section: Methodsmentioning

confidence: 99%

“…An optimization model was proposed in Shahhosseini et al 60 , which accounts for the tradeoff between bias and variance of the predictions, as it uses mean squared error (MSE) to form the objective function for the optimization problem 68 . In addition, out-of-bag predictions generated by -fold cross-validation are used as emulators of unseen test observations to create the input matrices of the optimization problem, which are out-of-bag predictions made by each base learner.…”

Section: Methodsmentioning

confidence: 99%

Coupling machine learning and crop modeling improves crop yield prediction in the US Corn Belt

Shahhosseini

Huber

et al. 2021

Sci Rep

Self Cite

245

100

View full text Add to dashboard Cite

This study investigates whether coupling crop modeling and machine learning (ML) improves corn yield predictions in the US Corn Belt. The main objectives are to explore whether a hybrid approach (crop modeling + ML) would result in better predictions, investigate which combinations of hybrid models provide the most accurate predictions, and determine the features from the crop modeling that are most effective to be integrated with ML for corn yield prediction. Five ML models (linear regression, LASSO, LightGBM, random forest, and XGBoost) and six ensemble models have been designed to address the research question. The results suggest that adding simulation crop model variables (APSIM) as input features to ML models can decrease yield prediction root mean squared error (RMSE) from 7 to 20%. Furthermore, we investigated partial inclusion of APSIM features in the ML prediction models and we found soil moisture related APSIM variables are most influential on the ML predictions followed by crop-related and phenology-related variables. Finally, based on feature importance measure, it has been observed that simulated APSIM average drought stress and average water table depth during the growing season are the most important APSIM inputs to ML. This result indicates that weather information alone is not sufficient and ML models need more hydrological inputs to make improved yield predictions.

show abstract

“…Shahhosseini et al [41] compare the behaviour of several ensemble models for the prediction of dwelling prices using two databases, widely cited in the relevant literature, the Boston metropolitan area dataset [42] and the sales database of residential homes in Ames (Iowa) presented in [43]. To demonstrate the validity of the ensemble models, they use the following algorithms: multiple learners including lasso regression, random forest, deep neural networks, extreme gradient boosting (XGBoost), and support vector machines with three kernels (polynomial, RBF, and sigmoid).…”

Section: Literature Reviewmentioning

confidence: 99%

A Fully Automated Adjustment of Ensemble Methods in Machine Learning for Modeling Complex Real Estate Systems

et al. 2020

View full text Add to dashboard Cite

The close relationship between collateral value and bank stability has led to a considerable need to a rapid and economical appraisal of real estate. The greater availability of information related to housing stock has prompted to the use of so-called big data and machine learning in the estimation of property prices. Although this methodology has already been applied to the real estate market to identify which variables influence dwelling prices, its use for estimating the price of properties is not so frequent. The application of this methodology has become more sophisticated over time, from applying simple methods to using the so-called ensemble methods and, while the estimation capacity has improved, it has only been applied to specific geographical areas. The main contribution of this article lies in developing an application for the entire Spanish market that fully automatically provides the best model for each municipality. Real estate property prices in 433 municipalities are estimated from a sample of 790,631 dwellings, using different ensemble methods based on decision trees such as bagging, boosting, and random forest. The results for estimating the price of dwellings show a good performance of the techniques developed, in terms of the error measures, with the best results being achieved using the techniques of bagging and random forest.

show abstract

Optimizing Ensemble Weights for Machine Learning Models: A Case Study for Housing Price Prediction

Cited by 26 publications

References 16 publications

Forecasting Corn Yield With Machine Learning Ensembles

Forecasting Corn Yield With Machine Learning Ensembles

Coupling machine learning and crop modeling improves crop yield prediction in the US Corn Belt

A Fully Automated Adjustment of Ensemble Methods in Machine Learning for Modeling Complex Real Estate Systems

Contact Info

Product

Resources

About