While analyzing the data, it is crucial to choose the model that best matches the circumstance. Many experts in the field of classification and regression have proposed ensemble strategies for tabular data, as well as various approaches to classification and regression problems. In this paper, Gini Index is applied on raw geographical dataset to convert continuous data into discrete dataset. Decision tree algorithm is implemented on resultant discrete dataset, Information Gain is calculated for every attribute and the attribute with highest information gain is the splitting node, applied recursively. Decision tree algorithm implemented predicts the rainfall in Kashmir province with the accuracy of 81.5%. MDL pruning is applied on the resultant decision tree in order to reduce the size & complexity of the Decision tree. Pruning removes segments of the tree that contribute little towards classification; the accuracy is marginally reduced to 81.1%. Furthermore, after the implementation of Decision tree a boosting algorithm: gradient boosting has been implemented on the same set of data using decision tree as a base estimator. It was observed that the overall accuracy of the decision tree got increased to 87.5% after the implementation of gradient boosting model. Thus, the obtained results predict that gradient boosted-DT outperforms all other approaches with the highest accuracy measure and high susceptibility rate in rainfall prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.