Navdeeppal Singh scite author profile

Navdeeppal Singh

4Publications

1Citation Statement Received

35Citation Statements Given

How they've been cited

How they cite others

Affiliations

Saarland University

Publications

Order By: Most citations

White-Box Attacks on Hate-speech BERT Classifiers in German with Explicit and Implicit Character Level Defense

Khan¹,

Shahid²,

Singh³

2021

BIJIIAC

View full text Add to dashboard Cite

Attention based Transformer models have achieved state-of-the-art results in natural language processing (NLP). However, recent work shows that the underlying attention mechanism can be exploited by adversaries to craft malicious inputs designed to induce spurious outputs, thereby harming model performance and trustworthiness. Unlike in the vision domain, the literature examining neural networks under adversarial conditions in the NLP domain is limited and most of it focuses mainly on the English language. In this paper, we first analyze the adversarial robustness of Bidirectional Encoder Representations from Transformers (BERT) models for German datasets. Second, we introduce two novel NLP attacks. Namely, a character-level and a word-level attacks, both of which utilize attention scores to calculate where to inject character-level and word-level noise, respectively. Finally, we present two defense strategies against the attacks above. The first implicit character-level defense is a variant of adversarial training, which trains a new classifier capable of abstaining/rejecting certain (ideally adversarial) inputs. The other explicit character-level defense learns a latent representation of the complete training data vocabulary and then maps all tokens of an input example to the same latent space, enabling the replacement of all out of vocabulary tokens with the most similar in-vocabulary tokens based on the cosine similarity metric.

show abstract

BERT Probe: A python package for probing attention based robustness evaluation of BERT models

2022

View full text Add to dashboard Cite

White-Box Attacks on Hate-speech BERT Classifiers in German with Explicit and Implicit Character Level Defense

Khan¹,

Shahid²,

Singh³

2022

Preprint

View full text Add to dashboard Cite

show abstract

White-Box Attacks on Hate-speech BERT Classiﬁers in German with Explicit and Implicit Character Level Defense

Khan¹,

Shahid²,

Singh³

2021

BIJCS

View full text Add to dashboard Cite

Attention based Transformer models have achieved state-of-the-art results in natural language processing (NLP). However, recent work shows that the underlying attention mechanism can be exploited by adversaries to craft malicious inputs designed to induce spurious outputs, thereby harming model performance and trustworthi- ness. Unlike in the vision domain, the literature examining neural networks under adversarial conditions in the NLP domain is limited and most of it focuses mainly on the English language. In this paper, we first analyze the adver- sarial robustness of Bidirectional Encoder Representations from Transformers (BERT) models for German datasets. Second, we introduce two novel NLP attacks. Namely, a character-level and a word-level attacks, both of which uti- lize attention scores to calculate where to inject character-level and word-level noise, respectively. Finally, we present two defense strategies against the attacks above. The first implicit character-level defense is a variant of adversar- ial training, which trains a new classifier capable of abstaining/rejecting certain (ideally adversarial) inputs. The other explicit character-level defense learns a latent representation of the complete training data vocabulary and then maps all tokens of an input example to the same latent space, enabling the replacement of all out of vocabulary tokens with the most similar in-vocabulary tokens based on the cosine similarity metric.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.