Banghee So scite author profile

Banghee So

5Publications

24Citation Statements Received

126Citation Statements Given

How they've been cited

How they cite others

138

124

Affiliations

University of Connecticut

Publications

Order By: Most citations

Synthetic Dataset Generation of Driver Telematics

Boucher

Valdez

2021

Risks

View full text Add to dashboard Cite

This article describes the techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset. The synthetic dataset generated has 100,000 policies that included observations regarding driver’s claims experience, together with associated classical risk variables and telematics-related variables. This work is aimed to produce a resource that can be used to advance models to assess risks for usage-based insurance. It follows a three-stage process while using machine learning algorithms. In the first stage, a synthetic portfolio of the space of feature variables is generated applying an extended SMOTE algorithm. The second stage is simulating values for the number of claims as multiple binary classifications applying feedforward neural networks. The third stage is simulating values for aggregated amount of claims as regression using feedforward neural networks, with number of claims included in the set of feature variables. The resulting dataset is evaluated by comparing the synthetic and real datasets when Poisson and gamma regression models are fitted to the respective data. Other visualization and data summarization produce remarkable similar statistics between the two datasets. We hope that researchers interested in obtaining telematics datasets to calibrate models or learning algorithms will find our work ot be valuable.

show abstract

Cost-sensitive Multi-class AdaBoost for Understanding Driving Behavior with Telematics

Boucher

Valdez

2020

Preprint

View full text Add to dashboard Cite

Powered with telematics technology, insurers can now capture a wide range of data, such as distance traveled, how drivers brake, accelerate or make turns, and travel frequency each day of the week, to better decode driver's behavior. Such additional information helps insurers improve risk assessments for usage-based insurance (UBI), an increasingly popular industry innovation. In this article, we explore how to integrate telematics information to better predict claims frequency. For motor insurance during a policy year, we typically observe a large proportion of drivers with zero claims, a less proportion with exactly one claim, and far lesser with two or more claims. We introduce the use of a cost-sensitive multi-class adaptive boosting (AdaBoost) algorithm, which we call SAMME.C2, to handle such imbalances. To calibrate SAMME.C2 algorithm, we use empirical data collected from a telematics program in Canada and we find improved assessment of driving behavior with telematics relative to traditional risk variables. We demonstrate our algorithm can outperform other models that can handle class imbalances: SAMME, SAMME with SMOTE, RUSBoost, and SMOTEBoost. The sampled data on telematics were observations during 2013-2016 for which 50,301 are used for training and another 21,574 for testing. Broadly speaking, the additional information derived from vehicle telematics helps refine risk classification of drivers of UBI.

show abstract

Cost-Sensitive Multi-Class Adaboost for Understanding Driving Behavior Based on Telematics

So¹,

Boucher²,

Valdez³

2021

ASTIN Bull.

View full text Add to dashboard Cite

Using telematics technology, insurers are able to capture a wide range of data to better decode driver behavior, such as distance traveled and how drivers brake, accelerate, or make turns. Such additional information also helps insurers improve risk assessments for usage-based insurance, a recent industry innovation. In this article, we explore the integration of telematics information into a classification model to determine driver heterogeneity. For motor insurance during a policy year, we typically observe a large proportion of drivers with zero accidents, a lower proportion with exactly one accident, and a far lower proportion with two or more accidents. We here introduce a cost-sensitive multi-class adaptive boosting (AdaBoost) algorithm we call SAMME.C2 to handle such class imbalances. We calibrate the algorithm using empirical data collected from a telematics program in Canada and demonstrate an improved assessment of driving behavior using telematics compared with traditional risk variables. Using suitable performance metrics, we show that our algorithm outperforms other learning models designed to handle class imbalances.

show abstract

Cost-sensitive Multi-class AdaBoost for Understanding Driving Behavior with Telematics

Boucher

Valdez

2020

SSRN Journal

View full text Add to dashboard Cite

The SAMME.C2 algorithm for severely imbalanced multi-class classification

So¹,

Valdez²

2021

Preprint

View full text Add to dashboard Cite

Classification predictive modeling involves the accurate assignment of observations in a dataset to target classes or categories. There is an increasing growth of real-world classification problems with severely imbalanced class distributions. In this case, minority classes have much fewer observations to learn from than those from majority classes. Despite this sparsity, a minority class is often considered the more interesting class yet developing a scientific learning algorithm suitable for the observations presents countless challenges. In this article, we suggest a novel multi-class classification algorithm specialized to handle severely imbalanced classes based on the method we refer to as SAMME.C2. It blends the flexible mechanics of the boosting techniques from SAMME algorithm, a multiclass classifier, and Ada.C2 algorithm, a cost-sensitive binary classifier designed to address highly class imbalances. Not only do we provide the resulting algorithm but we also establish scientific and statistical formulation of our proposed SAMME.C2 algorithm. Through numerical experiments examining various degrees of classifier difficulty, we demonstrate consistent superior performance of our proposed model.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Banghee So

Synthetic Dataset Generation of Driver Telematics

Cost-sensitive Multi-class AdaBoost for Understanding Driving Behavior with Telematics

Cost-Sensitive Multi-Class Adaboost for Understanding Driving Behavior Based on Telematics

Cost-sensitive Multi-class AdaBoost for Understanding Driving Behavior with Telematics

The SAMME.C2 algorithm for severely imbalanced multi-class classification

Contact Info

Product

Resources

About