Overfitting, robustness, and malicious algorithms: A study of potential causes of privacy risk in machine learning

Yeom, Samuel; Giacomelli, Irene; Menaged, Alan; Fredrikson, Matt; Jha, Somesh

doi:10.3233/jcs-191362

Cited by 33 publications

(43 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While adversarial training produces models that are more invariant to small changes in their inputs, these results show that the training procedure itself can be unstable. This may be related to prior work demonstrating that adversarially-trained models are more vulnerable to membership inference [54,63], a privacy attack that exploits memorization to leak information about training data. While membership vulnerability does not necessarily imply greater LUF, these experiments show that in many cases the two phenomena may be related.…”

Section: Luf and Robust Classificationmentioning

confidence: 69%

“…Instability also worsens concrete privacy attacks: oversensitivity to the training set can affect a model's parameters, which can be leveraged to perform membership inference [38,53,61]. Our experiments in Section 5 may suggest that this phenomenon has a connection to leave-one-out unfairness, in that adversarial training increases both LUF and the potential for membership inference attacks [54,63].…”

Section: Related Workmentioning

confidence: 98%

See 1 more Smart Citation

Leave-one-out Unfairness

Black

Fredrikson

2021

Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency

Self Cite

View full text Add to dashboard Cite

We introduce leave-one-out unfairness, which characterizes how likely a model's prediction for an individual will change due to the inclusion or removal of a single other person in the model's training data. Leave-one-out unfairness appeals to the idea that fair decisions are not arbitrary: they should not be based on the chance event of any one person's inclusion in the training data. Leave-one-out unfairness is closely related to algorithmic stability, but it focuses on the consistency of an individual point's prediction outcome over unit changes to the training data, rather than the error of the model in aggregate. Beyond formalizing leave-one-out unfairness, we characterize the extent to which deep models behave leave-one-out unfairly on real data, including in cases where the generalization error is small. Further, we demonstrate that adversarial training and randomized smoothing techniques have opposite effects on leave-one-out fairness, which sheds light on the relationships between robustness, memorization, individual fairness, and leave-one-out fairness in deep models. Finally, we discuss salient practical applications that may be negatively affected by leave-one-out unfairness.

show abstract

Section: Luf and Robust Classificationmentioning

confidence: 69%

Section: Related Workmentioning

confidence: 98%

Leave-one-out Unfairness

Black

Fredrikson

2021

Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency

Self Cite

View full text Add to dashboard Cite

show abstract

“…In general, three types of prediction error including bias, variance, and irreducible error (noise) are reported in application of individual ML algorithms. [ 53 ]. Therefore, ensemble algorithms were built to improve robustness over a single model with combining the predictions of several models [ 54 , 55 ].…”

Section: Discussionmentioning

confidence: 99%

Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas

Hesami

Yoosefzadeh-Najafabadi

Adamek

et al. 2021

Molecules

View full text Add to dashboard Cite

The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas-mediated genome editing system has recently been used for haploid production in plants. Haploid induction using the CRISPR/Cas system represents an attractive approach in cannabis, an economically important industrial, recreational, and medicinal plant. However, the CRISPR system requires the design of precise (on-target) single-guide RNA (sgRNA). Therefore, it is essential to predict off-target activity of the designed sgRNAs to avoid unexpected outcomes. The current study is aimed to assess the predictive ability of three machine learning (ML) algorithms (radial basis function (RBF), support vector machine (SVM), and random forest (RF)) alongside the ensemble-bagging (E-B) strategy by synergizing MIT and cutting frequency determination (CFD) scores to predict sgRNA off-target activity through in silico targeting a histone H3-like centromeric protein, HTR12, in cannabis. The RF algorithm exhibited the highest precision, recall, and F-measure compared to all the tested individual algorithms with values of 0.61, 0.64, and 0.62, respectively. We then used the RF algorithm as a meta-classifier for the E-B method, which led to an increased precision with an F-measure of 0.62 and 0.66, respectively. The E-B algorithm had the highest area under the precision recall curves (AUC-PRC; 0.74) and area under the receiver operating characteristic (ROC) curves (AUC-ROC; 0.71), displaying the success of using E-B as one of the common ensemble strategies. This study constitutes a foundational resource of utilizing ML models to predict gRNA off-target activities in cannabis.

show abstract

“…However, despite the advantages of using ML for answering biological questions, possible issues must be overcome in the ML model development and implementation. Overfitting is a common issue when developing ML models, 15 whereby a ML model does not generalize well from observed to unseen data. In this instance, while the model may perform well when making predictions on training data, predictions are not accurate when exposed to new data.…”

Section: Introductionmentioning

confidence: 99%

Machine learning for the life-time risk prediction of Alzheimer’s disease: a systematic review

Rowe

Katzourou

Stevenson-Hoare

et al. 2021

Brain Communications

View full text Add to dashboard Cite

Alzheimer’s disease is a neurodegenerative disorder and the most common form of dementia. Early diagnosis may assist interventions to delay onset and reduce the progression rate of the disease. We systematically reviewed the use of machine learning algorithms for predicting Alzheimer’s disease using single nucleotide polymorphisms and instances where these were combined with other types of data. We evaluated the ability of machine learning models to distinguish between controls and cases, while also assessing their implementation and potential biases. Articles published between December 2009—June 2020 were collected using Scopus, PubMed and Google Scholar. These were systematically screened for inclusion leading to a final set of 12 publications. Eighty-five percent of the included studies used the Alzheimer's Disease Neuroimaging Initiative dataset. In studies which reported area under the curve, discrimination varied (0.49–0.97). However, more than half of the included manuscripts used other forms of measurement such as accuracy, sensitivity and specificity. Model calibration statistics were also found to be reported inconsistently across all studies. The most frequent limitation in the assessed studies was sample size, with the total number of participants often numbering less than a thousand, whilst the number of predictors usually ran into the many thousands. In addition, key steps in model implementation and validation were often not performed or unreported, making it difficult to assess the capability of machine learning models.

show abstract

Overfitting, robustness, and malicious algorithms: A study of potential causes of privacy risk in machine learning

Cited by 33 publications

References 40 publications

Leave-one-out Unfairness

Leave-one-out Unfairness

Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas

Machine learning for the life-time risk prediction of Alzheimer’s disease: a systematic review

Contact Info

Product

Resources

About