We have specific impressions from the style of a typeface (font), suggesting that there are correlations between font shape and its impressions. Based on this hypothesis, we realize a shared latent space where a font shape image and its impression words are embedded in a cross-modal manner. This latent space is useful to understand the style-impression correlation and generate font images by specifying several impression words. Experimental results with a large style-impression dataset prove that it is possible to accurately realize the shared latent space, especially for shape-relevant impression words, and then use the space to generate font images with various impressions.