Generative adversarial networks (GANs) generate high-dimensional vector spaces (latent spaces) that can interchangeably represent vectors as images. Advancements have extended their ability to computationally generate images indistinguishable from real images such as faces, and more importantly, manipulate images using their inherit vector values in the latent space. This interchangeability of latent vector has the potential to calculate not only distance in the latent space, but also human perceptual and cognitive distance toward images, i.e., how humans perceive and recognize images. However, it is still unclear how the distance in the latent space corresponds to human perception and cognition. Our studies investigated the correspondence between latent vectors and human perception or cognition through psycho-visual experiments that manipulates the latent vectors of face images. In the perception study, a change perception (CP) task was utilized to examine whether participants could perceive visual changes in face images before and after moving an arbitrary distance in the latent space. In the cognition study, a face cognition (FC) task was utilized to examine whether the participants could recognize a face as the same, even after moving an arbitrary distance in the latent space. The results showed that CP and cognition for face images clearly correlates to the distance in the latent space, which can be modeled with a logistic function. We also investigated how the internal layered structure of the latent space correlates to human response by calculating the regression residual error in each layer. As a result, we observed different residual error trends pertaining to CP and FC. Our experiments show that the distance between face images in the latent space corresponds to human perception and cognition for visual changes in face imagery, and additionally indicates that perception and cognition correspond with the latent space differently. By utilizing our methodology, it will be possible to interchangeably convert between the distance in the latent space and the metric of human perception and cognition, potentially leading to image processing that better reflects human perception and cognition.