Bike-sharing systems, aiming at meeting the public’s need for ”last mile” transportation, are becoming popular in recent years. With an accurate demand prediction model, shared bikes, though with a limited amount, can be effectively utilized whenever and wherever there are travel demands. Despite that some deep learning methods, especially long shortterm memory neural networks (LSTMs), can improve the performance of traditional demand prediction methods only based on temporal representation, such improvement is limited due to a lack of mining complex spatial-temporal relations. To address this issue, we proposed a novel model named STG2Vec to learn the representation from heterogeneous spatial-temporal graph. Specifically, we developed an event-flow serializing method to encode the evolution of dynamic heterogeneous graph into a special language pattern such as word sequence in a corpus. Furthermore, a dynamic attention-based graph embedding model is introduced to obtain an importance-awareness vectorized representation of the event flow. Additionally, together with other multi-source information such as geographical position, historical transition patterns and weather, e.g., the representation learned by STG2Vec can be fed into the LSTMs for temporal modeling. Experimental results from Citi-Bike electronic usage records dataset in New York City have illustrated that the proposed model can achieve competitive prediction performance compared with its variants and other baseline models.