Cardiovascular diseases are considered as the most life-threatening syndromes with the highest mortality rate globally. Over a period of time, they have become very common and are now overstretching the healthcare systems of countries. The major factors of cardiovascular diseases are high blood pressure, family history, stress, age, gender, cholesterol, Body Mass Index (BMI), and unhealthy lifestyle. Based on these factors, researchers have proposed various approaches for early diagnosis. However, the accuracy of proposed techniques and approaches needs certain improvements due to the inherent criticality and life threatening risks of cardiovascular diseases. In this article, a MaLCaDD (Machine Learning based Cardiovascular Disease Diagnosis) framework is proposed for the effective prediction of cardiovascular diseases with high precision. Particularly, the framework first deals with the missing values (via mean replacement technique) and data imbalance (via Synthetic Minority Over-sampling Technique -SMOTE). Subsequently, Feature Importance technique is utilized for feature selection. Finally, an ensemble of Logistic Regression and K-Nearest Neighbor (KNN) classifiers is proposed for prediction with higher accuracy. The validation of framework is performed through three benchmark datasets (i.e. Framingham, Heart Disease and Cleveland) and the accuracies of 99.1%, 98% and 95.5 % are achieved respectively. Finally, the comparative analysis prove that MaLCaDD predictions are more accurate (with reduced set of features) as compared to the existing state-of-the-art approaches. Therefore, MaLCaDD is highly reliable and can be applied in real environment for the early diagnosis of cardiovascular diseases.
Numerous e-news channels publish the daily happenings in the world from different sources. These huge amounts of news articles have lamentably conceived the information overload issue among the users. Hence text mining, which aims in extracting previously unknown information from unstructured text, has been widely used by several researchers to segregate full news articles however, the news headlines categorization is still specifically limited. Therefore, considering this limitation, the current research aims to propose a framework that will self-learn and automatically classify any given news headline into its corresponding news category using artificial intelligence methods i.e. text mining and machine learning algorithms. The proposed framework consists of three stages: Exploratory Data Analysis, Text Pre-processing, and Text Classification. For exploratory data analysis, the top 10 most frequent balanced news categories are chosen so that further processing of data can be done on a more balanced version of the dataset. After exploring the data, text pre-processing techniques are applied to make the data transformed, normalized, and structured. Finally, text classification is carried out with two approaches: unsupervised classification using Mean Shift and K-means algorithms and supervised classification using Logistic Regression with Bag of Words and TF-IDF algorithm. To depict the working of the proposed framework, a case study is presented on a news headlines dataset which accurately performed news headlines classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.