In 2020 Cancer caused 685,000 deaths worldwide, thus it is considered the second lethal disease globally. Because diagnosis of Metastatic Breast Cancer (MBC) patients is challenging, a prediction tool is needed during the diagnosis stage to define and prioritize patients who are more likely to develop metastasis and provide them with optimal palliative or supportive care. Machine Learning (ML) as a subset of Artificial Intelligence (AI) has been applied in oncology for early detection of cancer, identifying patients with high risk of survival, cancer morbidity, and mortality rate, besides predicting drug response. . One of the main applications of Machine Learning in public health is the identification and prediction of populations with high risk for developing specific adverse health outcomes, and development of appropriate targeted health interventions. Better data quality is crucial for better patient targeting and Informed Decision-Making. Also, the more and sufficient quality data the better machine learning model performance. Noisy or unclean may lead to inaccurate or faulty prediction, which is crucial in medical field. Consequently, data quality is essential for better Machine Learning model performance. The aim of this research is to determine the key challenges of using raw datasets, and illustrates how Machine Learning techniques can be used to explore and preprocess the dataset to overcome these challenges.