In the field of natural language understanding (NLU), a fundamental element is the representation of word meanings, a process that is integral to a wide range of applications. These applications span numerous domains including machine translation, question answering systems, text summarization, information retrieval, and supporting the functionality of virtual assistants. The increasing demand for reasoning in environments that are multilingual and knowledge transfer in cross-lingual systems has led to the development of cross-lingual semantic spaces. These spaces provide a representation of words from different languages in a shared space.
With the increased emphasis on cross-lingual representations, several methods have been developed. Related works usually differ in training strategies and evaluate only limited aspects of semantic spaces. This lack of meaningful comparison made us write this study as we think it is crucial for the following research. As a basis for our comparison, we project semantic spaces into a shared space using both linear transformations supervised by bilingual dictionaries and transformations with no supervision at all. To allow comparison from different points of view, our evaluation includes both intrinsic tasks, such as cross-lingual word similarities, cross-lingual word analogies, and word machine translation, and extrinsic tasks like sentiment analysis and topic classification. Additionally, we also explored hubness to investigate the internal relationships within the semantic space. Our experiments include six languages from three different language families: English, German, Italian, Spanish, Croatian and Czech. Finally, we show that different preprocessing steps can have a significant impact on the performance of cross-lingual semantic spaces.