Cancer is considered the second lethal disease in the world, with estimated 9.6 million deaths in 2018. Early detection of cancer can increase the survival rate and decrease both treatment costs and patients suffering. At the national level, this can reduce total annual economic cost of healthcare expenditure and loss of productivity. Predictive analytics using data mining and machine learning techniques have proven successful for early detection of cancer, identification of patients with high risk of survival, cancer morbidity, and mortality rate and predicting drug response. The aim of this survey paper is to review the important role of data mining and machine learning techniques in the detection of cancer. The paper provides a comparison between the most popular predictive tools and techniques, types of data, extracted features, error rate, diagnosis, associated factors, and estimation methods.
e13558 Background: Artificial intelligence (AI) and machine learning (ML) have outstanding contributions in oncology. One of the applications is the early detection of breast cancer. Recently, several ML and data mining techniques have been used for both detection and classification of breast cancer cases. It is found that about 25% of breast cancer cases have an aggressive cancer at diagnosis time, with metastatic spread. The absence or presence of metastatic spread largely determines the patient’s survival. Hence, early detection is very important for reducing cancer mortality rates Methods: This study aims at applying ML and data mining, using AI techniques, for exploring and preprocessing breast cancer dataset, before building the ML classification Model for breast cancer metastasis prediction. The model will be implemented for mass screening, to prioritize patients who are more likely to develop metastases. A dataset of breast cancer cases was provided by the Oncology and Nuclear Medicine Department, Faculty of Medicine, Alexandria University. It contains clinical records of 5236 patients, diagnosed with breast cancer. ML libraries in Python programming language was used to explore the dataset and determine ratio of missing data, define data types, redundant data, and specify class label and predictors that to be used for the classification model. Results: In this work, the results showed that missing data ratio in some columns exceeds 90%, there are redundant features to be eliminated, data type conversion and feature reduction should be applied to prepare the data. Conclusions: Based on the previous findings, it is recommended to use ML preprocessing python libraries to prepare the dataset before building ML classification model of breast cancer metastasis prediction.
In 2020 Cancer caused 685,000 deaths worldwide, thus it is considered the second lethal disease globally. Because diagnosis of Metastatic Breast Cancer (MBC) patients is challenging, a prediction tool is needed during the diagnosis stage to define and prioritize patients who are more likely to develop metastasis and provide them with optimal palliative or supportive care. Machine Learning (ML) as a subset of Artificial Intelligence (AI) has been applied in oncology for early detection of cancer, identifying patients with high risk of survival, cancer morbidity, and mortality rate, besides predicting drug response. . One of the main applications of Machine Learning in public health is the identification and prediction of populations with high risk for developing specific adverse health outcomes, and development of appropriate targeted health interventions. Better data quality is crucial for better patient targeting and Informed Decision-Making. Also, the more and sufficient quality data the better machine learning model performance. Noisy or unclean may lead to inaccurate or faulty prediction, which is crucial in medical field. Consequently, data quality is essential for better Machine Learning model performance. The aim of this research is to determine the key challenges of using raw datasets, and illustrates how Machine Learning techniques can be used to explore and preprocess the dataset to overcome these challenges.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.