Machine learning (ML) and deep learning (DL) are now ubiquitous in our society, and techniques that enable responsible usage are fundamental to safeguard people from being negatively affected. One particular example of DL's success is image classification. However, DL techniques function as black-box models whose knowledge representation is difficult to comprehend, and understanding the conditions under which they behave correctly is hard. Another example of a DL application is data generation, for which an algorithm known as GANs, mainly used for data augmentation, has achieved remarkable success. In GANs, two networks -the generator and the discriminator -are simultaneously trained. The generator learns to produce realistic data by trying to fool the discriminator, which is trained to distinguish between real and fake samples.This dissertation proposes a GAN-based approach for synthesizing new data to help understand DL image classifiers. We aim to generate examples that are hard for a given classifier that we could, ultimately, systematically analyze to get information about cases where the model's performance degrades. For that, we opt to generate data classified with low confidence by a classifier. Our approach, dubbed GASTeN, consists of modifying the loss function of the generator to include a new objective, dubbed confusion distance, which reflects how far the generated images are from having the desired output by the targetted classifier, i.e., the one we wish to evaluate. It introduces two hyperparameters, a weight to factor the new loss term, and the duration of pre-training of the GAN without any modifications.We empirically evaluate our proposal by instantiating it with a DCGAN architecture and a confusion distance suitable for binary classification. In our experiments, we target classifiers of binary subsets of the MNIST and Fashion MNIST datasets. We explore several hyperparameter configurations and target classifiers with different performances, analyzing the algorithm's behavior by collecting quantitative metrics for the two optimization objectives -FID for image realness and the average value of the confusion distance for the goal of confusing the classifier. In our experiments, we show scenarios in which we can obtain a generator with the desired properties of generating data with high realisticness that is mainly classified with low confidence by the target classifiers, along with scenarios where our goal is not attained. We conclude that, while challenging to optimize for both objectives simultaneously, it is possible to achieve images with the desired properties, albeit at the cost of carefully chosen hyperparameters.