Purpose: Machine learning (ML) refers to algorithms (often models) that are learned directly from data, germane to past experience. As algorithms have constantly been evolving with the exponential increase of computing power and vastly generated data, privacy of algorithms as well as of data becomes extremely important due to regulations and IP rights. Therefore, it is vital to address privacy and security concerns of both data and model together with other performance metrics when commercializing machine learning models.
Aim: Our motivation is to show that it is possible to deploy privacy-preserving machine learning inference methods to monetize the investment in ML models without disclosing them and patients’ data by providing a security analysis to define an appropriate query limit per user with ESSG’s adult spinal deformity dataset.
Method: We implement a privacy-preserving tree-based machine learning inference and run two security scenarios (scenario A and scenario B) containing four parts with progressively increasing the number of synthetic data points, which are used to enhance the accuracy of the attacker’s substitute model. A target model is generated with particular operation site(s) in each scenario, and substitute models are built with nine-time threefold cross-validation using the XGBoost algorithm with the remaining sites’ data to assess the security of the target model. First, we create box plots of the test sets’ accuracy, sensitivity, precision, and F-score metrics to compare the substitute models’ performance with the target model. Second, we compare the gain values of the target and substitute models’ features. Third, we provide an in-depth analysis to check the inclusion of target model split points in substitute models with a heatmap. Finally, we compare the outputs of public and privacy-preserving models and report intermediate timing results.
Results: The privacy-preserving XGBoost model results are identical to the original plaintext model in the aforementioned two scenarios in terms of prediction accuracy. The differences between performance metrics of best-performing substitute models and target models are 0.27, 0.18, 0.25, 0.26 for scenario A, and 0.04, 0, 0.04, and 0.03 for scenario B for accuracy, sensitivity, precision, and F-score, respectively. The differences between target model accuracy and the mean accuracy values of models in each scenario on the substitute models’ test dataset are 0.38 for scenario A and 0.14 for scenario B.
Conclusion: Based on our findings, we conclude that machine learning models (i.e., our target models) may contribute to the advancement in the field of application where they are deployed. When the security of both the model and the user data is guaranteed, it is possible to monetize the modeling efforts of companies or organizations such as ESSG.