The increasing uncertain components of power systems foster the wide applications of Machine Learning (ML) techniques. While traditional ML models demand a large set of data, data-scarce dilemmas exist for new meters, devices, and new grids. Further, for rich historical measurements, valuable data may still be limited, especially for targets like identifying system events that rarely occur in the power system. To enhance the event type differentiation and localization for a datalimited grid, we propose a Transfer Learning (TL) framework to transfer knowledge from a data-rich grid (source grid) to the target grid, using measurements from Phasor Measurement Units (PMUs). The transferring process is challenging because of (1) high-volume data with redundant information, (2) different measurement dimensionalities, (3) dissimilar data distributions, and (4) disjoint event-location-label spaces for two grids. To handle the challenges of (1) to (3), we propose a joint optimization to reduce dimensionality and maximize common knowledge in a shared low-dimensional feature space, where the commonality lies in the same dimensions and close data distributions. Such an optimization-based procedure is verified via rigid mathematical theorems given the same label space, i.e., event-type-label space. However, for event localization, challenge (4) obstructs the optimization. Therefore, we design a label space alignment method to relabel the event location by the event zone location and build an event zone estimation problem. Then, the framework is generalized to both tasks. Finally, comprehensive experiments demonstrate the advantages of the proposed methods over stateof-the-art transfer learning models.