2021
DOI: 10.1016/j.neunet.2021.03.017
|View full text |Cite
|
Sign up to set email alerts
|

CiwGAN and fiwGAN: Encoding information in acoustic data to model lexical learning with Generative Adversarial Networks

Abstract: How can deep neural networks encode information that corresponds to words in human speech into raw acoustic data? This paper proposes two neural network architectures for modeling unsupervised lexical learning from raw acoustic inputs, ciwGAN (Categorical InfoWaveGAN) and fiwGAN (Featural InfoWaveGAN), that combine a Deep Convolutional GAN architecture for audio data (WaveGAN; Donahue et al. 2019) with an information theoretic extension of GAN-InfoGAN (Chen et al., 2016), and propose a new latent space structu… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
32
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 23 publications
(34 citation statements)
references
References 25 publications
2
32
0
Order By: Relevance
“…An unresolved question is whether this is enough. Speech recognition has been modeled with more than two layers (Beguš, Gašper 2021) with only modest success, whereas models with only two dense layers are more successful (Arnold et al 2017;Shafaei-Bajestan et al 2020). Multilingual acquisition has also sucessfully been modeled with only two layers (Chuang et al 2021), as well as several other lexical processing phenomena (Baayen et al 2019b;Baayen & Smolka 2020;Chuang et al 2020c;Tomaschek et al 2019).…”
Section: Discussionmentioning
confidence: 99%
“…An unresolved question is whether this is enough. Speech recognition has been modeled with more than two layers (Beguš, Gašper 2021) with only modest success, whereas models with only two dense layers are more successful (Arnold et al 2017;Shafaei-Bajestan et al 2020). Multilingual acquisition has also sucessfully been modeled with only two layers (Chuang et al 2021), as well as several other lexical processing phenomena (Baayen et al 2019b;Baayen & Smolka 2020;Chuang et al 2020c;Tomaschek et al 2019).…”
Section: Discussionmentioning
confidence: 99%
“…In the first experiment, we use the ciwGAN (Categorical InfoWaveGAN) model proposed in Beguš (2021a). The ciwGAN model combines the WaveGAN and InfoGAN architectures.…”
Section: Modelmentioning
confidence: 99%
“…The Discriminator/Q-network learns to retrieve the Generator's latent categorical or continuous codes (Chen et al, 2016) in addition to estimating realness of generated outputs and real training data. Beguš (2021a) proposes a model that combines these two proposals and introduces a new latent space structure (in the fiwGAN architecture). Because we are primarily interested in simple binary classification between bare and reduplicated forms, we use the ciwGAN variant of the proposal.…”
Section: Modelmentioning
confidence: 99%
See 2 more Smart Citations