Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2380
|View full text |Cite
|
Sign up to set email alerts
|

Hide and Speak: Towards Deep Neural Networks for Speech Steganography

Abstract: Steganography is the science of hiding a secret message within an ordinary public message, which referred to as Carrier. Traditionally, digital signal processing techniques, such as least significant bit encoding, were used for hiding messages. In this paper, we explore the use of deep neural networks as steganographic functions for speech data. To this end, we propose to jointly optimize two neural networks: the first network encodes the message inside a carrier, while the second network decodes the message f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
34
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 25 publications
(34 citation statements)
references
References 22 publications
0
34
0
Order By: Relevance
“…We propose to generate imperceptible adversarial perturbations by training a Gated Convolutional Autoencoder (GCA) composed of an encoder and a decoder, operating in the frequency domain (see Figure 1). The architecture of our GCA is inspired by the steganographic method of [14] in the sense of the encoder and decoder. The encoder, E(•), creates a latent representation of the spectral representation s of the original audio file, h = E(s) through three gated convolutional layers.…”
Section: Overviewmentioning
confidence: 99%
See 4 more Smart Citations
“…We propose to generate imperceptible adversarial perturbations by training a Gated Convolutional Autoencoder (GCA) composed of an encoder and a decoder, operating in the frequency domain (see Figure 1). The architecture of our GCA is inspired by the steganographic method of [14] in the sense of the encoder and decoder. The encoder, E(•), creates a latent representation of the spectral representation s of the original audio file, h = E(s) through three gated convolutional layers.…”
Section: Overviewmentioning
confidence: 99%
“…The decoder, D(•), takesḣ and generates the adversarial spectral representation of the audio asṡ = D(ḣ) through four gated convolutional layers. Each gated convolutional layer of encoder and decoder, similarly to [14], is composed of 64 3 × 3 kernels, followed by a batch normalization and a dropout layer. However, our GCA has two main differences from [14].…”
Section: Overviewmentioning
confidence: 99%
See 3 more Smart Citations