Good characterization of traffic interactions among urban roads can facilitate traffic-related applications, such as traffic control and short-term forecasting. Most studies measure the traffic interaction between two roads by their topological distance or the correlation between their traffic variables. However, the distance-based methods neglect the spatial heterogeneity of roads' traffic interactions, while the correlation-based methods cannot capture the non-linear dependency between two roads' traffic variables. In this paper, we propose a novel approach called Road2Vec to quantify the implicit traffic interactions among roads based on large-scale taxi operating route data using a Word2Vec model from the natural language processing (NLP) field. First, the analogy between transportation elements (i.e., road segment, travel route) and NLP terms (i.e., word, document) is established. Second, the real-valued vectors for road segments are trained from massive travel routes using the Word2Vec model. Third, the traffic interaction between any pair of roads is measured by the cosine similarity of their vectors. A case study on short-term traffic forecasting is conducted with artificial neural network (ANN) and support vector machine (SVM) algorithms to validate the advantages of the presented method. The results show that the forecasting achieves a higher accuracy with the support of the Road2Vec method than with the topological distance and traffic correlation based methods. We argue that the Road2Vec method can be effectively utilized for quantifying complex traffic interactions among roads and capturing underlying heterogeneous and non-linear properties.