In this paper, we leverage pre-trained language models (PLMs) to precisely evaluate the semantics preservation of edition process on sentences. Our metric, Neighbor Distribution Divergence (NDD), evaluates the disturbance on predicted distribution of neighboring words from mask language model (MLM). NDD is capable of detecting precise changes in semantics which are easily ignored by text similarity. By exploiting the property of NDD, we implement a unsupervised and even trainingfree algorithm for extractive sentence compression. We show that our NDD-based algorithm outperforms previous perplexity-based unsupervised algorithm by a large margin. For further exploration on interpretability, we evaluate NDD by pruning on syntactic dependency treebanks and apply NDD for predicate detection as well.
Privacy protection is an important and concerning topic in Federated Learning, especially for Natural Language Processing. In client devices, a large number of texts containing personal information are produced by users every day. As the direct application of information from users is likely to invade personal privacy, many methods have been proposed in Federated Learning to block the center model from the raw information in client devices. In this paper, we try to do this more linguistically via distorting the text while preserving the semantics. In practice, we leverage a recently proposed metric, Neighboring Distribution Divergence, to evaluate the semantic preservation during the distortion. Based on the metric, we propose two frameworks for semantics-preserved distortion, a generative one and a substitutive one. Due to the lack of privacy-related tasks in the current Natural Language Processing field, we conduct experiments on named entity recognition and constituency parsing. Results from our experiments show the plausibility and efficiency of our distortion as a method for personal privacy protection. 1
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.