Seba Susan scite author profile

Millions of people have been infected and lakhs of people have lost their lives due to the worldwide ongoing novel Coronavirus (COVID-19) pandemic. It is of utmost importance to identify the future infected cases and the virus spread rate for advance preparation in the healthcare services to avoid deaths. Accurately forecasting the spread of COVID-19 is an analytical and challenging real-world problem to the research community. Therefore, we use day level information of COVID-19 spread for cumulative cases from whole world and 10 mostly affected countries; US,

show abstract

The balancing trick: Optimized sampling of imbalanced datasets—A brief survey of the recent State of the Art

Susan

Kumar

2020

Engineering Reports

View full text Add to dashboard Cite

This survey paper focuses on one of the current primary issues challenging data mining researchers experimenting on real-world datasets. The problem is that of imbalanced class distribution that generates a bias toward the majority class due to insufficient training samples from the minority class. The current machine learning and deep learning algorithms are trained on datasets that are insufficiently represented in certain categories. On the other hand, some other classes have surplus samples due to the ready availability of data from these categories.Conventional solutions suggest undersampling of the majority class and/or oversampling of the minority class for balancing the class distribution prior to the learning phase. Though this problem of uneven class distribution is, by and large, ignored by researchers focusing on the learning technology, a need has now arisen for incorporating balance correction and data pruning procedures within the learning process itself. This paper surveys a plethora of conventional and recent techniques that address this issue through intelligent representations of samples from the majority and minority classes, that are given as input to the learning module. The application of nature-inspired evolutionary algorithms to intelligent sampling is examined, and so are hybrid sampling strategies that select and retain the difficult-to-learn samples and discard the easy-to-learn samples. The findings by various researchers are summarized to a logical end, and various possibilities and challenges for future directions in research are outlined. K E Y W O R D Sclass-imbalance problem, hybrid sampling, imbalanced data, oversampling, sampling, undersampling INTRODUCTIONLearning from imbalanced datasets results in a bias toward the majority class whose labeled samples are available in plenty as compared to the insufficiently represented minority class. 1 In data mining, factors that bring down the classifier performance are the intrinsic characteristics of the data and an uneven class distribution. 2 Lack of adequate data in the minority class results in a fuzzy and ever-varying decision boundary, leading to erroneous results. The problem isThis is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.

show abstract

Automatic texture defect detection using Gaussian mixture entropy modeling

Susan

Sharma

2017

Neurocomputing

View full text Add to dashboard Cite

Deep transfer with minority data augmentation for imbalanced breast cancer dataset

Saini

Susan

2020

Applied Soft Computing

107

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Seba Susan

Fuzzy rule based unsupervised sentiment analysis from social media posts

COVID-19 Pandemic Prediction using Time Series Forecasting Models

The balancing trick: Optimized sampling of imbalanced datasets—A brief survey of the recent State of the Art

Automatic texture defect detection using Gaussian mixture entropy modeling

Deep transfer with minority data augmentation for imbalanced breast cancer dataset

Contact Info

Product

Resources

About