“…Toy Experiment: In this experiment, the VAE Encoder is made up of 3 CNN layers with filter sizes of [ (3,3), (4,4), (5,5)], strides of [1, 2, 2], and padding of [1, 1, 2] respectively. Similarly, the decoder is made up of 3 CNN layers with filter sizes of [ (6,6), (6,6), (5,5)], strides of [2, 2, 1], and padding of [2,2,2] respectively.…”