Towards Near-imperceptible Steganographic Text

Dai, Falcon Z.; Cai, Zheng

doi:10.18653/v1/p19-1422

Cited by 38 publications

(19 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We found that 1.41% of the masked tokens had substitution candidates that did not reproduce the original segmentations. Although this danger applies equally to generation-based steganography built on top of subword LMs (Dai and Cai, 2019;Ziegler et al, 2019;Shen et al, 2020), to our knowledge, we are the first to point it out.…”

Section: Resultsmentioning

confidence: 96%

“…Let n be the largest integer that satisfies 2 n ≤ c, where c is the number of the remaining items. Each item is given a unique bit chunk of size n. Coding is an active research topic (Dai and Cai, 2019;Ziegler et al, 2019;Shen et al, 2020) and is orthogonal to our core proposal.…”

Section: Encoding Strategymentioning

confidence: 99%

“…With advances in neural language models (LMs), edit-based approaches have been replaced by generation-based ones (Fang et al, 2017;Yang et al, 2019;Dai and Cai, 2019;Ziegler et al, 2019;Shen et al, 2020). In these approaches, bit chunks are directly assigned to the conditional probability distribution over the next word estimated by the LM, yielding impressive payload capacities of 1-5 bits per word (Shen et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Frustratingly Easy Edit-based Linguistic Steganography with a Masked Language Model

Ueoka¹,

Murawaki

Kurohashi

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

With advances in neural language models, the focus of linguistic steganography has shifted from edit-based approaches to generationbased ones. While the latter's payload capacity is impressive, generating genuine-looking texts remains challenging. In this paper, we revisit edit-based linguistic steganography, with the idea that a masked language model offers an off-the-shelf solution. The proposed method eliminates painstaking rule construction and has a high payload capacity for an edit-based model. It is also shown to be more secure against automatic detection than a generation-based method while offering better control of the security/payload capacity tradeoff.

show abstract

Section: Resultsmentioning

confidence: 96%

Section: Encoding Strategymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Frustratingly Easy Edit-based Linguistic Steganography with a Masked Language Model

Ueoka¹,

Murawaki

Kurohashi

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…Then they took the token which has the same code as the secret information. Dai and Cai (2019) proposed patient-Huffman, which was an improved version of Yang et al (2018a) that sacrificed embedding capacity for imperceptibility. They first calculated the distortion (total variation distance or KL divergence) between q and p LM and then only used Huffman coding embedding algorithm to embed secret information when the distortion was less than a preset threshold δ.…”

Section: Imperceptibilitymentioning

confidence: 99%

“…In recent years, powered by the advanced technology of deep learning and natural language processing, language models based on neural networks have made significant progress in generating fluent text (Radford et al, 2019;Brown et al, 2020), which bring new vitality to linguistic steganography and facilitate the investigation of generationbased methods (Fang et al, 2017;Yang et al, 2018a;Dai and Cai, 2019;Ziegler et al, 2019;Yang et al, 2020a;Zhou et al, 2021). The generative linguistic steganography directly transform secret information into innocuous-looking steganographic text (stegotext) without any covertext.…”

Section: Introductionmentioning

confidence: 99%

Provably Secure Generative Linguistic Steganography

Zhang

Yang

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Generative linguistic steganography mainly utilized language models and applied steganographic sampling (stegosampling) to generate high-security steganographic text (stegotext). However, previous methods generally lead to statistical differences between the conditional probability distributions of stegotext and natural text, which brings about security risks. In this paper, to further ensure security, we present a novel provably secure generative linguistic steganographic method ADG, which recursively embeds secret information by Adaptive Dynamic Grouping of tokens according to their probability given by an offthe-shelf language model. We not only prove the security of ADG mathematically, but also conduct extensive experiments on three public corpora to further verify its imperceptibility. The experimental results reveal that the proposed method is able to generate stegotext with nearly perfect security.

show abstract