Background Artificial intelligence and digital health care have substantially advanced to improve and enhance medical diagnosis and treatment during the prolonged period of the COVID-19 global pandemic. In this study, we discuss the development of prediction models for the self-diagnosis of polycystic ovary syndrome (PCOS) using machine learning techniques. Objective We aim to develop self-diagnostic prediction models for PCOS in potential patients and clinical providers. For potential patients, the prediction is based only on noninvasive measures such as anthropomorphic measures, symptoms, age, and other lifestyle factors so that the proposed prediction tool can be conveniently used without any laboratory or ultrasound test results. For clinical providers who can access patients’ medical test results, prediction models using all predictor variables can be adopted to help health providers diagnose patients with PCOS. We compare both prediction models using various error metrics. We call the former model the patient model and the latter, the provider model throughout this paper. Methods In this retrospective study, a publicly available data set of 541 women’s health information collected from 10 different hospitals in Kerala, India, including PCOS status, was acquired and used for analysis. We adopted the CatBoost method for classification, K-fold cross-validation for estimating the performance of models, and SHAP (Shapley Additive Explanations) values to explain the importance of each variable. In our subgroup study, we used k-means clustering and Principal Component Analysis to split the data set into 2 distinct BMI subgroups and compared the prediction results as well as the feature importance between the 2 subgroups. Results We achieved 81% to 82.5% prediction accuracy of PCOS status without any invasive measures in the patient models and achieved 87.5% to 90.1% prediction accuracy using both noninvasive and invasive predictor variables in the provider models. Among noninvasive measures, variables including acanthosis nigricans, acne, hirsutism, irregular menstrual cycle, length of menstrual cycle, weight gain, fast food consumption, and age were more important in the models. In medical test results, the numbers of follicles in the right and left ovaries and anti-Müllerian hormone were ranked highly in feature importance. We also reported more detailed results in a subgroup study. Conclusions The proposed prediction models are ultimately expected to serve as a convenient digital platform with which users can acquire pre- or self-diagnosis and counsel for the risk of PCOS, with or without obtaining medical test results. It will enable women to conveniently access the platform at home without delay before they seek further medical care. Clinical providers can also use the proposed prediction tool to help diagnose PCOS in women.
BACKGROUND With the prolonged period of the Covid-19 global pandemic, artificial intelligent and digital health care have substantially advanced to improve and enhance diagnosis and treatment. In this study, we discuss development of a digital prediction tool for diagnosis of Polycystic Ovary Syndrome (PCOS) using machine learning techniques. OBJECTIVE We aim to develop a self-diagnostic prediction digital platform for Polycystic Ovary Syndrome based on noninvasive measures such as age, lifestyle, and anthropomorphic measures and symptoms, that do not require any lab or ultrasound results. METHODS In this retrospective study, a publicly available dataset of 541 women’s health information collected in Kerala, India, including PCOS status was acquired and used for analysis. Principal component analysis and K-means clustering are adopted to classify the sample into four subgroups based on anthropomorphic measures. The prediction for PCOS based on noninvasive measures is made on each subgroup using random forest classifiers and the prediction errors are estimated. Important predictors for diagnosing PCOS are identified for each subgroup. RESULTS By adopting subgroup models, we substantially improve the prediction error rates by 11.84% across the subgroups compared to using one model to the entire sample. The mean precision, sensitivity, accuracy and F1-score for the diagnosis of PCOS using the proposed subgroup models are 0.950, 0.972, 0.940 and 0.961, respectively. No invasive measures are used in this prediction. CONCLUSIONS Anthropomorphic measures are found to be important variables to classify the women into subgroups. The results of the proposed subgroup models suggest that more accurate prediction for the diagnosis of PCOS can be made when different models are used for different subgroups rather than when a single model is used for the whole sample. This work enables women to conveniently access the proposed self-diagnosis prediction platform at home without delay before they seek for further medical care. CLINICALTRIAL This is an observational study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.