2020
DOI: 10.1080/09599916.2020.1832558
|View full text |Cite
|
Sign up to set email alerts
|

Predicting property prices with machine learning algorithms

Abstract: This study uses three machine learning algorithms including, support vector machine (SVM), random forest (RF) and gradient boosting machine (GBM) in the appraisal of property prices. It applies these methods to examine a data sample of about 40,000 housing transactions in a period of over 18 years in Hong Kong, and then compares the results of these algorithms. In terms of predictive power, RF and GBM have achieved better performance when compared to SVM. The three performance metrics including mean squared er… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
107
0
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 157 publications
(111 citation statements)
references
References 56 publications
2
107
0
2
Order By: Relevance
“…Table 1 presented the dataset description with the mean, median and standard deviation computed for ten feature variables surveyed, including eight internal structure characteristics and two external characteristics [ 59 , 62 ]. Resulting from the comparison of the median and the mean for the variables in Table 1 , the samples collected might be considered to obey normal distribution, essentially providing unbiased baselines to characterize and understand the effect of the features of interest on housing prices in Boston.…”
Section: Resultsmentioning
confidence: 99%
“…Table 1 presented the dataset description with the mean, median and standard deviation computed for ten feature variables surveyed, including eight internal structure characteristics and two external characteristics [ 59 , 62 ]. Resulting from the comparison of the median and the mean for the variables in Table 1 , the samples collected might be considered to obey normal distribution, essentially providing unbiased baselines to characterize and understand the effect of the features of interest on housing prices in Boston.…”
Section: Resultsmentioning
confidence: 99%
“…Each feature was generated with the chosen algorithms which were evaluated using the performance metrics i.e., the R² and the RMSE. The Root Mean Square Error is a standard way for measuring how errorless the model is in predicting quantitative data; meanwhile, the R² indicates the significant effects on the dependent variable (Ho, 2020;Yilmazer, 2020).…”
Section: Resultsmentioning
confidence: 99%
“…GTB is a tree-based ensemble model that combines numerous weak classifiers to provide accurate classification [ 35 ]. It is a marked improvement on the classification performance of RF models and can avoid the problem of multi-collinearity [ 24 , 25 ]. Furthermore, XGB is an optimized type of GTB model and is more efficient than other conventional models; in particular, it has the ability to prevent overfitting through regularization [ 24 ].…”
Section: Discussionmentioning
confidence: 99%
“…Whenever SNP data were missing, we imputed the mode of the same disease for each SNP. While tuning the hyperparameters, the hyperparameters are optimized through 5-fold cross-validation only for the training set ( Supplementary Table 1 ) whereas the optimization of hyperparameters was not executed in testing set [ 25 ].…”
Section: Methodsmentioning
confidence: 99%