Computer Science &Amp; Information Technology (CS &Amp; IT ) 2019
DOI: 10.5121/csit.2019.90611
|View full text |Cite
|
Sign up to set email alerts
|

Coping With Class Imbalance in Classification of Traffic Crash Severity Based on Sensor and Road Data: A Feature Selection and Data Augmentation Approach

Abstract: This paper presents machine learning-based approaches to classification of historical traffic crashes in Kansas by severity, applied to a data set consisting of highway geometry, weather, and road sensor data. The goal of this work is to identify relevant features using a variety of loss measures and algorithms for feature selection. This is shown to facilitate the discovery of the most relevant sensors for the task of learning to predict severe crashes (those involving bodily injury). The key technical challe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 34 publications
0
3
0
Order By: Relevance
“…In addition, the binary classification problem has a comparatively low prediction complexity in terms of classification prediction. For example, Delen et al (2017) and Lamba et al (2019) built the human injury prediction models with high/low injury degree classification based on the SVM algorithm, and the accuracies were 0.904 and 0.966, respectively. The prediction category in this study is five, which makes it more difficult to predict than a binary classification problem.…”
Section: Discussionmentioning
confidence: 99%
“…In addition, the binary classification problem has a comparatively low prediction complexity in terms of classification prediction. For example, Delen et al (2017) and Lamba et al (2019) built the human injury prediction models with high/low injury degree classification based on the SVM algorithm, and the accuracies were 0.904 and 0.966, respectively. The prediction category in this study is five, which makes it more difficult to predict than a binary classification problem.…”
Section: Discussionmentioning
confidence: 99%
“…Considering the importance of data imbalance and its prevalence in the real world, modelers have been looking for appropriate strategies to tackle this challenge. In particular, successful applications of data balancing techniques and consequent model improvements have been recently reported in safety analysis literature (73)(74)(75)(76). Ahmadi et al showed that a SVM model slightly outperformed multinomial and mixed logit models, provided that model parameters are efficiently tuned.…”
Section: Data Imbalancementioning
confidence: 99%
“…They addressed data imbalance by tuning separate cost parameters in their SVM model structure (73). Lamba et al (74) used a variety of over-and undersampling techniques and showed that a combination of algorithmic feature selection with random over-sampling provides the best model performance for precision and recall. Yahaya et al used a variety of variable selection techniques and combined them with SMOTE oversampling strategy.…”
Section: Data Imbalancementioning
confidence: 99%