Breast cancer is one of the most common cancers in women all over the world. Due to the improvement of medical treatments, most of the breast cancer patients would be in remission. However, the patients have to face the next challenge, the recurrence of breast cancer which may cause more severe effects, and even death. The prediction of breast cancer recurrence is crucial for reducing mortality. This paper proposes a prediction model for the recurrence of breast cancer based on clinical nominal and numeric features. In this study, our data consist of 1,061 patients from Breast Cancer Registry from Shin Kong Wu Ho-Su Memorial Hospital between 2011 and 2016, in which 37 records are denoted as breast cancer recurrence. Each record has 85 features. Our approach consists of three stages. First, we perform data preprocessing and feature selection techniques to consolidate the dataset. Among all features, six features are identified for further processing in the following stages. Next, we apply resampling techniques to resolve the issue of class imbalance. Finally, we construct two classifiers, AdaBoost and cost-sensitive learning, to predict the risk of recurrence and carry out the performance evaluation in three-fold cross-validation. By applying the AdaBoost method, we achieve accuracy of 0.973 and sensitivity of 0.675. By combining the AdaBoost and cost-sensitive method of our model, we achieve a reasonable accuracy of 0.468 and substantially high sensitivity of 0.947 which guarantee almost no false dismissal. Our model can be used as a supporting tool in the setting and evaluation of the follow-up visit for early intervention and more advanced treatments to lower cancer mortality.
The International Statistical Classification of Disease and Related Health Problems (ICD) is an international standard system for categorizing and reporting diseases, injuries, disorders, and health conditions. Most previously-proposed disease predicting systems need clinical information collected by the medical staff from the patients in hospitals. In this paper, we propose a deep learning algorithm to classify disease types and identify diagnostic codes by using only the subjective component of progress notes in medical records. In this study, we have a dataset, consisting of about one hundred and sixty-eight thousand medical records, from a medical center, collected during 2003 and 2017. First, we apply standard text processing procedures to parse the sentences and word embedding techniques for vector representations. Next, we build a convolution neural network model on the medical records to predict the ICD-9 code by using a subjective component of the progress note. The prediction performance is evaluated by ten-fold cross-validation and yields an accuracy of 0.409, recall of 0.409 and precision of 0.436. If we only consider the “chapter match” of ICD-9 code, our model achieves an accuracy of 0.580, recall of 0.580, and precision of 0.582. Since our diagnostic code prediction model is solely based on subjective components (mainly, patients’ self-report descriptions), the proposed approach could serve as a remote and self-diagnosis assistance tool, prior to seeking medical advice or going to the hospital. In addition, our work may be used as a primary evaluation tool for discomfort in the rural area where medical resources are restricted.
Breast cancer is one of the most common cause of cancer mortality. Early detection through mammography screening could significantly reduce mortality from breast cancer. However, most of screening methods may consume large amount of resources. We propose a computational model, which is solely based on personal health information, for breast cancer risk assessment. Our model can be served as a pre-screening program in the low-cost setting. In our study, the data set, consisting of 3976 records, is collected from Taipei City Hospital starting from 2008.1.1 to 2008.12.31. Based on the dataset, we first apply the sampling techniques and dimension reduction method to preprocess the testing data. Then, we construct various kinds of classifiers (including basic classifiers, ensemble methods, and cost-sensitive methods) to predict the risk. The cost-sensitive method with random forest classifier is able to achieve recall (or sensitivity) as 100 %. At the recall of 100 %, the precision (positive predictive value, PPV), and specificity of cost-sensitive method with random forest classifier was 2.9 % and 14.87 %, respectively. In our study, we build a breast cancer risk assessment model by using the data mining techniques. Our model has the potential to be served as an assisting tool in the breast cancer screening.
Effectively handling the limited number of surgery operating rooms equipped with expensive equipment is a challenging task for hospital management such as reducing the case-time duration and reducing idle time. Improving the efficiency of operating room usage via reducing the idle time with better scheduling would rely on accurate estimation of surgery duration. Our model can achieve a good prediction result on surgery duration with a dozen of features. We have found the result of our best performing department-specific XGBoost model with the values 31.6 min, 18.71 min, 0.71, 28% and 27% for the metrics of root-mean-square error (RMSE), mean absolute error (MAE), coefficient of determination (R2), mean absolute percentage error (MAPE) and proportion of estimated result within 10% variation, respectively. We have presented each department-specific result with our estimated results between 5 and 10 min deviation would be more informative to the users in the real application. Our study shows comparable performance with previous studies, and the machine learning methods use fewer features that are better suited for universal usability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.