Background
Artificial intelligence and digital health care have substantially advanced to improve and enhance medical diagnosis and treatment during the prolonged period of the COVID-19 global pandemic. In this study, we discuss the development of prediction models for the self-diagnosis of polycystic ovary syndrome (PCOS) using machine learning techniques.
Objective
We aim to develop self-diagnostic prediction models for PCOS in potential patients and clinical providers. For potential patients, the prediction is based only on noninvasive measures such as anthropomorphic measures, symptoms, age, and other lifestyle factors so that the proposed prediction tool can be conveniently used without any laboratory or ultrasound test results. For clinical providers who can access patients’ medical test results, prediction models using all predictor variables can be adopted to help health providers diagnose patients with PCOS. We compare both prediction models using various error metrics. We call the former model the patient model and the latter, the provider model throughout this paper.
Methods
In this retrospective study, a publicly available data set of 541 women’s health information collected from 10 different hospitals in Kerala, India, including PCOS status, was acquired and used for analysis. We adopted the CatBoost method for classification, K-fold cross-validation for estimating the performance of models, and SHAP (Shapley Additive Explanations) values to explain the importance of each variable. In our subgroup study, we used k-means clustering and Principal Component Analysis to split the data set into 2 distinct BMI subgroups and compared the prediction results as well as the feature importance between the 2 subgroups.
Results
We achieved 81% to 82.5% prediction accuracy of PCOS status without any invasive measures in the patient models and achieved 87.5% to 90.1% prediction accuracy using both noninvasive and invasive predictor variables in the provider models. Among noninvasive measures, variables including acanthosis nigricans, acne, hirsutism, irregular menstrual cycle, length of menstrual cycle, weight gain, fast food consumption, and age were more important in the models. In medical test results, the numbers of follicles in the right and left ovaries and anti-Müllerian hormone were ranked highly in feature importance. We also reported more detailed results in a subgroup study.
Conclusions
The proposed prediction models are ultimately expected to serve as a convenient digital platform with which users can acquire pre- or self-diagnosis and counsel for the risk of PCOS, with or without obtaining medical test results. It will enable women to conveniently access the platform at home without delay before they seek further medical care. Clinical providers can also use the proposed prediction tool to help diagnose PCOS in women.