Zero-shot learning (ZSL) aims at recognizing unseen class examples (e.g., images) with knowledge transferred from seen classes. This is typically achieved by exploiting a semantic feature space shared by both seen and unseen classes, e.g., attributes or word vectors, as the bridge. In ZSL, the common practice is to train a mapping function between the visual and semantic feature spaces with labeled seen class examples. When inferring, given unseen class examples, the learned mapping function is reused to them and recognizes the class labels on some metrics among their semantic relations. However, the visual and semantic feature spaces are generally independent and exist in entirely different manifolds. Under such a paradigm, the ZSL models may easily suffer from the domain shift problem when constructing and reusing the mapping function, which becomes the major challenge in ZSL. In this thesis, we explore effective ways to mitigate the domain shift problem and learn a robust mapping function between the visual and semantic feature spaces. We focus on fully empowering the semantic feature space, which is one of the key building blocks of ZSL.First, we consider to adaptively adjust to rectify the semantic feature space for ZSL. Conventional ZSL models generally regard the semantic feature space unchangeable during training. However, it can be observed that when mapping visual features to semantic feature space, the obtained semantic features are usually overly concentrated. This deficiency affects models' ability to adapt and generalize to more unseen classes. As we know, the process of human beings understanding things is constantly improving. Similarly, we argue the semantic feature space also needs to be dynamically adjusted to accommodate more robust learning of mapping Abstract iii function. Specifically, the adjustment is conducted on both the class prototypes and global distribution during training. Moreover, we also propose to combine the adjustment with a cycle mapping to formulate the training to a more efficient framework that can not only rectify the semantic feature space but also speed up the training process.Second, we consider to align the manifold structures between the visual and semantic feature spaces by expanding the semantic features. Compared to our first method which directly adjusts to rectify the semantic feature space, the expansion process is more conservative and soft. Specifically, we build upon an autoencoderbased model to generate several auxiliary semantic features combining with the previous ones, to expand the space. Additionally, the expansion is jointly guided by an embedded manifold extracted from the visual feature space, which retains its geometrical and structural information. By aligning the two feature spaces, the trained mapping function is more robust and well-matched that significantly mitigates the domain shift problem.Last, we consider to take a further step to explore and empower the correlation between the visual and semantic feature spaces in a more fine-grained pe...