We present two novel unsupervised methods for eliminating toxicity in text. Our first method combines two recent ideas: (1) guidance of the generation process with small styleconditional language models and (2) use of paraphrasing models to perform style transfer. We use a well-performing paraphraser guided by style-trained language models to keep the text content and remove toxicity. Our second method uses BERT to replace toxic words with their non-offensive synonyms. We make the method more flexible by enabling BERT to replace mask tokens with a variable number of words. Finally, we present the first largescale comparative study of style transfer models on the task of toxicity removal. We compare our models with a number of methods for style transfer. The models are evaluated in a reference-free way using a combination of unsupervised style transfer metrics. Both methods we suggest yield new SOTA results.
It is well known that disagreements about cotton color grades between high volume instruments and classers are substantial, and these machine-classer disagreements deter full acceptance of machine grading of cotton color. This paper provides first a quantitative analysis of the distributions of these disagreements across all the color grades, both major and subcolor categories. The study proves that the disagreements can be both systematic and random, and further analyzes the possible sources for them. Second, the paper introduces a novel design of a neural network classifier for cotton color classification. This classifier consists of multiple networks performing a two-step classification that identifies major and subcolor categories separately. The classifier can be trained by any desirable data. In this research, it is trained using a set of classers' grades, and it exhibits good generalization for the new testing data. The classifier seems to reduce machine-classer disagreements to a minimal level, which is limited by the classer's reproducibility.
In this paper, we present the system we used in the Taxonomy Enrichment for the Russian Language evaluation campaign. The goal of this challenge is to predict hypernyms for the words not included in the taxonomy. Our approach was to generate and score candidate hypernyms by word embedding similarity of the input words and concepts already in the taxonomy. Despite being very simple, our system was ranked first on the verbs track.
We introduce the first study of the automatic detoxification of Russian texts to combat offensive language. This kind of textual style transfer can be used for processing toxic content on social media or for eliminating toxicity in automatically generated texts. While much work has been done for the English language in this field, there are no works on detoxification for the Russian language. We suggest two types of models—an approach based on BERT architecture that performs local corrections and a supervised approach based on a pretrained GPT-2 language model. We compare these methods with several baselines. In addition, we provide the training datasets and describe the evaluation setup and metrics for automatic and manual evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.
This paper describes the application of fuzzy logic to cotton color grading in an attempt to improve the acceptance of machine grading for cotton colors. Cotton color grades are a number of classes in the (Rd, b) color space. Adjacent color classes have blurry and overlapping boundaries, making crisp-boundary methods ineffective for cotton color classification. Fuzzy logic is specialized to deal with uncertainty and imprecision in the decision-making process, and thus offers a new approach for grading cotton colors. In this paper, we present the procedures for constructing a fuzzy inference system (FIS) using fuzzy logic to classify major classes of cotton colors, and the preliminary results to demonstrate FIS effectiveness in reducing machine-classer disagreements in color grading. The results from the Fis show great consistency for multiple year of cotton color data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.