e13558 Background: Artificial intelligence (AI) and machine learning (ML) have outstanding contributions in oncology. One of the applications is the early detection of breast cancer. Recently, several ML and data mining techniques have been used for both detection and classification of breast cancer cases. It is found that about 25% of breast cancer cases have an aggressive cancer at diagnosis time, with metastatic spread. The absence or presence of metastatic spread largely determines the patient’s survival. Hence, early detection is very important for reducing cancer mortality rates Methods: This study aims at applying ML and data mining, using AI techniques, for exploring and preprocessing breast cancer dataset, before building the ML classification Model for breast cancer metastasis prediction. The model will be implemented for mass screening, to prioritize patients who are more likely to develop metastases. A dataset of breast cancer cases was provided by the Oncology and Nuclear Medicine Department, Faculty of Medicine, Alexandria University. It contains clinical records of 5236 patients, diagnosed with breast cancer. ML libraries in Python programming language was used to explore the dataset and determine ratio of missing data, define data types, redundant data, and specify class label and predictors that to be used for the classification model. Results: In this work, the results showed that missing data ratio in some columns exceeds 90%, there are redundant features to be eliminated, data type conversion and feature reduction should be applied to prepare the data. Conclusions: Based on the previous findings, it is recommended to use ML preprocessing python libraries to prepare the dataset before building ML classification model of breast cancer metastasis prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.