Multivariate spatial data, where multiple responses are simultaneously recorded across spatially indexed observational units, are routinely collected in a wide variety of disciplines. For example, the Southern Ocean Continuous Plankton Recorder survey collects records of zooplankton communities in the Indian sector of the Southern Ocean, with the aim of identifying and quantifying spatial patterns in biodiversity in response to environmental change. One increasingly popular method for modeling such data is spatial generalized linear latent variable models (GLLVMs), where the correlation across sites is captured by a spatial covariance function in the latent variables. However, little is known about the impact of misspecifying the latent variable correlation structure on inference of various parameters in such models. To address this gap in the literature, we investigate how misspecifying and assuming independence for the latent variables' correlation structure impacts estimation and inference in spatial GLLVMs. Through both theory and numerical studies, we show that performance of maximum likelihood estimation and inference on regression coefficients under misspecification depends on a combination of the response type, the magnitude of true regression coefficient and the corresponding loadings, and, most importantly, whether the corresponding covariate is (also) spatially correlated. On the other hand, estimation and inference of truly non-zero loadings and prediction of latent variables is consistently not robust to misspecification of the latent variable correlation structure.