Customer churn is a central problem in almost every sector. Due to the diversity of the customers, products and services, and a massive amount of data being generated as a result of e-commerce tools and services, (big) data analytics and artificial intelligence-based methods have been developed and used for churn analysis in order to develop a strategy that is expected to understand the reasons behind the customer churn and subsequently to develop an effective and profitable customer retention programme. The analysis based on the data analytics and artificial intelligence methods focuses more on the profiling of customers, the classification of customer churn and identification of features that affect the churn. However, there doesn't seem many studies that would be able to help understand how much a potential customer is likely to (or not likely to) pay for the products or services when churned or not, and to predict how much a particular customer or group of customers may have paid for the products or services. Therefore, in this study, a two-level churn analysis is proposed to (1) classify the customer churn or not, and (1) predict how much the customer has paid for the service. In order to achieve it, a machine learning method, namely support vector machine (SVM), was used for the classification part whereas a monthly service charge was predicted by using support vector regression (SVR) method. In order to select the most appropriate feature subset for both analyses, an unsupervised feature selection method, namely the multi-cluster feature selection method was utilized. The same feature selection method was used for both analyses for the sake of consistency to understand its performance over both analyses. The proposed hybrid approach was then applied in IBM's Telcom data set with over 7000 customers in order to demonstrate the applicability and generalization ability of the proposed two-level approach. The SVMbased classification method has yielded AUC 85.6 and total classification accuracy of 81.5% being higher than those of a recent study where an aggressive set of the supervised classification methods was performed. The SVR-based prediction of the monthly charge has resulted in RMSE of 1.27, which is a reasonably acceptable outcome in the sector given the diversity of the ranges of charges as evidenced in its standard deviation. The approach presented in the study demonstrates that both the churn classification and charge prediction can be performed at the same time with a higher degree of accuracy. As the approach is open for further improvement, future analysis will be carried out to improve the accuracy for both analyses over other data sets to demonstrate its robustness and generalization ability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.