In almost all countries, precautionary measures are less expensive than medical treatment. The early detection of any disease gives a patient better chances of successful treatment than disease discovery at an advanced stage of its development. If we do not know how to treat patients, any treatment we can provide would be useful and would provide a more comfortable life. Cervical cancer is one such disease, considered to be fourth among the most common types of cancer in women around the world. There are many factors that increase the risk of cervical cancer, such as age and use of hormonal contraceptives. Early detection of cervical cancer helps to raise recovery rates and reduce death rates. This paper aims to use machine learning algorithms to find a model capable of diagnosing cervical cancer with high accuracy and sensitivity. The cervical cancer risk factor dataset from the University of California at Irvine (UCI) was used to construct the classification model through a voting method that combines three classifiers: Decision tree, logistic regression and random forest. The synthetic minority oversampling technique (SMOTE) was used to solve the problem of imbalance dataset and, together with the principal component analysis (PCA) technique, to reduce dimensions that do not affect model accuracy. Then, stratified 10-fold cross-validation technique was used to prevent the overfitting problem. This dataset contains four target variables-Hinselmann, Schiller, Cytology, and Biopsy-with 32 risk factors. We found that using the voting classifier, SMOTE and PCA techniques helped raise the accuracy, sensitivity, and area under the Receiver Operating Characteristic curve (ROC_AUC) of the predictive models created for each of the four target variables to higher rates. In the SMOTE-voting model, accuracy, sensitivity and PPA ratios improved by 0.93 % to 5.13 %, 39.26 % to 46.97 % and 2 % to 29 %, respectively for all target variables. Moreover, using PCA technology reduced computational processing time and increasing model efficiency. Finally, after comparing our results with several previous studies, it was found that our models were able to diagnose cervical cancer more efficiently according to certain evaluation measures.
The internet era creates new types of large and real-time data; much of those data are non-standard such as streaming and sensor-generated data. Advanced big data technologies enable organizations to extract insights from sophisticated data. Volume, variety and velocity represent big data challenges, which cause difficulties in capture, storage, search, sharing, analysis and visualization. Therefore, technologies like No-SQL, Hadoop and cloud computing used to extract value from large volumes and a wide variety of data to discover business needs. This article's goal is to focus on the challenges of big data and how the recent technologies can be used to address those issues, which are illustrated through real world case studies. The article also presents the lessons learned from these case studies.
The appearance of big data has created new challenges for data analysis teams especially dealing with unstructured data in text form. Many applications increasingly include a large amount of this type of data. Example of such data is data collected from Twitter. Adequate use of Machine Learning (ML), big data tools and social media platforms can solve several problems. The aim of this research is to apply sentiment analysis using Arabic tweets of tourism in Saudi Arabia and determine the most visited places. Ara Senti corpus was used as the labelled data to perform machine learning for sentiment analysis to deal with the Arabic morphology. The three-classes classification (Positive, Negative, or Neutral) was performed using Decision Tree, Random Forest, Logistic Regression and Naïve Bayes. The results showed that the highest performance achieved was 86% using Logistic Regression with Term Frequency-Inverse Document Frequency (TF-IDF) representation and Naïve Bayes with Bag-of-Words model compared with both random forest and decision tree. The trainable classifier was applied to predict classes on collected data from Twitter for reviewing Kingdom of Saudi Arabia (KSA) destinations to finally present a rating of the most visited places on KSA. There are five most visited places in Saudi Arabia (Riyadh, Alula, Hail, Taif and Tabuk).
Mining big data is getting a lot of attention currently because the businesses need more complex information in order to increase their revenue and gain competitive advantage. Therefore, mining the huge amount of data as well as mining real-time data needs to be done by new data mining techniques/approaches. This chapter will discuss big data volume, variety and velocity, data mining techniques and open source tools for handling very large datasets. Moreover, the chapter will focus on two industrial areas telecommunications and healthcare and lessons learned from them.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.