Slang Detection and Identification

Pei, Zhengqi; Sun, Zhewei; Xu, Yang

doi:10.18653/v1/k19-1082

Cited by 22 publications

(20 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In their study, both the spelling of a word and its context are provided as input to a translation model to decode a definition sentence. Pei et al (2019) proposed end-to-end neural models to detect and identify slang automatically in natural sentences. Kulkarni and Wang (2018) have proposed computational models that derive novel word forms of slang from spellings of existing words.…”

Section: Related Workmentioning

confidence: 99%

“…Here, we focus on syntax and linguistic context, although our framework should allow for the incorporation of social or extra-linguistic features as well. Recent work has found that the flexibility of slang is reflected prominently in syntactic shift (Pei et al, 2019). For example, ice-most commonly used as a noun-is used as a verb to express ''to kill'' (in Figure 1).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Computational Framework for Slang Generation

Sun

Zemel

2021

Transactions of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Slang is a common type of informal language, but its flexible nature and paucity of data resources present challenges for existing natural language systems. We take an initial step toward machine generation of slang by developing a framework that models the speaker’s word choice in slang context. Our framework encodes novel slang meaning by relating the conventional and slang senses of a word while incorporating syntactic and contextual knowledge in slang usage. We construct the framework using a combination of probabilistic inference and neural contrastive learning. We perform rigorous evaluations on three slang dictionaries and show that our approach not only outperforms state-of-the-art language models, but also better predicts the historical emergence of slang word usages from 1960s to 2000s. We interpret the proposed models and find that the contrastively learned semantic space is sensitive to the similarities between slang and conventional senses of words. Our work creates opportunities for the automated generation and interpretation of informal language.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Computational Framework for Slang Generation

Sun

Zemel

2021

Transactions of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Wikipedia) to transfer to another (e.g. social media) (Eisenstein, 2013b;Baldwin et al, 2013b;Belinkov and Bisk, 2018;Pei et al, 2019). Worse yet, for an overwhelming majority of lower resource languages, unstructured and unlabeled text on the Internet is often the sole source of data to train NLP systems (Joshi et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

“…Most of the dataset: formal-informal word pairs labeled with their word formation used to train these models are also in English. Other dictionaries of informal English words include SlangNet (Dhuliawala et al, 2016), SlangSD (Wu et al, 2018), and SLANGZY (Pei et al, 2019). There is also a dataset that contains pairs of formal-informal Indonesian words (Salsabila et al, 2018), but they are not annotated with word formation mechanisms.…”

Section: Related Workmentioning

confidence: 99%

“…In Korean, some compounded or shortened version of Konglish is also widely used (Khan and Choi, 2016), e.g., chimaek from chicken and maek ('beer'). Any insight we obtain through evaluating models on our dataset may therefore be of interest to other languages that share similar colloquial transformations; insights that may be increasingly paramount due to the rising prevalance of non-standard text in many languages on the web (Kulkarni and Wang, 2018;Joshi et al, 2020) and the challenges they pose to NLP systems (Belinkov and Bisk, 2018;Pei et al, 2019).…”

Section: Indonesian Colloquial Wordsmentioning

confidence: 99%

See 1 more Smart Citation

IndoCollex: A Testbed for Morphological Transformation of Indonesian Word Colloquialism

Wibowo¹,

Nityasya²,

Akyürek³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Indonesian language is heavily riddled with colloquialism whether in written or spoken forms. In this paper, we identify a class of Indonesian colloquial words that have undergone morphological transformations from their standard forms, categorize their word formations, and propose a benchmark dataset of Indonesian Colloquial Lexicons (IndoCollex) consisting of informal words on Twitter expertly annotated with their standard forms and their word formation types/tags. We evaluate several models for character-level transduction to perform morphological word normalization on this testbed to understand their failure cases and provide baselines for future work. As IndoCollex catalogues word formation phenomena that are also present in the non-standard text of other languages, it can also provide an attractive testbed for methods tailored for cross-lingual word normalization and non-standard word formation.

show abstract