Mikhail Yurochkin scite author profile

Mikhail Yurochkin

5Publications

37Citation Statements Received

149Citation Statements Given

How they've been cited

How they cite others

102

134

Affiliations

IBM (United States)

Publications

Order By: Most citations

Your fairness may vary: Pretrained language model fairness in toxic text classification

Baldini¹,

Wei²,

Ramamurthy³

et al. 2022

View full text Add to dashboard Cite

The popularity of pretrained language models in natural language processing systems calls for a careful evaluation of such models in down-stream tasks, which have a higher potential for societal impact. The evaluation of such systems usually focuses on accuracy measures. Our findings in this paper call for attention to be paid to fairness measures as well. Through the analysis of more than a dozen pretrained language models of varying sizes on two toxic text classification tasks (English), we demonstrate that focusing on accuracy measures alone can lead to models with wide variation in fairness characteristics. Specifically, we observe that fairness can vary even more than accuracy with increasing training data size and different random initializations. At the same time, we find that little of the fairness variation is explained by model size, despite claims in the literature. To improve model fairness without retraining, we show that two post-processing methods developed for structured, tabular data can be successfully applied to a range of pretrained language models.Warning: This paper contains samples of offensive text.

show abstract

Rewiring with Positional Encodings for Graph Neural Networks

Brüel-Gabrielsson¹,

Yurochkin²,

Solomon³

2022

Preprint

View full text Add to dashboard Cite

Several recent works use positional encodings to extend the receptive fields of graph neural network (GNN) layers equipped with attention mechanisms. These techniques, however, extend receptive fields to the complete graph, at substantial computational cost and risking a change in the inductive biases of conventional GNNs, or require complex architecture adjustments. As a conservative alternative, we use positional encodings to expand receptive fields to any r-ring. Our method augments the input graph with additional nodes/edges and uses positional encodings as node and/or edge features. Thus, it is compatible with many existing GNN architectures. We also provide examples of positional encodings that are non-invasive, i.e., there is a oneto-one map between the original and the modified graphs. Our experiments demonstrate that extending receptive fields via positional encodings and a virtual fully-connected node significantly improves GNN performance and alleviates over-squashing using small r. We obtain improvements across models, showing state-ofthe-art performance even using older architectures than recent Transformer models adapted to graphs.

show abstract

Training individually fair ML models with Sensitive Subspace Robustness

Yurochkin¹,

Bower²,

Sun³

2019

Preprint

View full text Add to dashboard Cite

We consider an approach to training machine learning systems that are fair in the sense that their performance is invariant under certain perturbations to the features. For example, the performance of a resume screening system should be invariant under changes to the name of the applicant or switching the gender pronouns. We connect this intuitive notion of algorithmic fairness to individual fairness and study how to certify ML algorithms as algorithmically fair. We also demonstrate the effectiveness of our approach on three machine learning tasks that are susceptible to gender and racial biases.

show abstract

Model Fusion with Kullback--Leibler Divergence

Yurochkin¹,

Ghosh²,

Solomon³

2020

Preprint

View full text Add to dashboard Cite

Measuring the robustness of Gaussian processes to kernel choice

Stephenson¹,

Ghosh²,

Nguyen³

et al. 2021

Preprint

View full text Add to dashboard Cite

Gaussian processes (GPs) are used to make medical and scientific decisions, including in cardiac care and monitoring of carbon dioxide emissions. But the choice of GP kernel is often somewhat arbitrary. In particular, uncountably many kernels typically align with qualitative prior knowledge (e.g. function smoothness or stationarity). But in practice, data analysts choose among a handful of convenient standard kernels (e.g. squared exponential). In the present work, we ask: Would decisions made with a GP differ under other, qualitatively interchangeable kernels? We show how to formulate this sensitivity analysis as a constrained optimization problem over a finite-dimensional space. We can then use standard optimizers to identify substantive changes in relevant decisions made with a GP. We demonstrate in both synthetic and real-world examples that decisions made with a GP can exhibit substantial sensitivity to kernel choice, even when prior draws are qualitatively interchangeable to a user.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.