Understanding and learning the characteristics of network paths has been of particular interest for decades and has led to several successful applications. Such analysis becomes challenging for urban networks as their size and complexity are significantly higher compared to other networks. The stateof-the-art machine learning (ML) techniques allow us to detect hidden patterns and, thus, infer the features associated with them. However, very little is known about the impact on the performance of such predictive models by the use of different input representations. In this paper, we design and evaluate six different graph input representations (i.e., representations of the network paths), by considering the network's topological and temporal characteristics, for being used as inputs for machine learning models to learn the behavior of urban networks paths. The representations are validated and then tested with a real-world taxi journeys dataset predicting the tips using a road network of New York. Our results demonstrate that the input representations that use temporal information help the model to achieve the highest accuracy (RMSE of 1.42$).Recently, machine learning (ML) techniques ([28, 27, 23]) have been applied to the analysis of urban networks ([11, 17, 25]). However, due to memory and computational limitations, performing analysis on the networks remains challenging as their dimension increases (such as in the case of the road or transport networks of large cities). This is also due to the fact that, unlike other types of data that are easily transformed for example into time series or grids (e.g., an image can be represented as a matrix of pixels), there is no standard way of representing the network paths for using them as input to ML models. When the training dataset is large in size, ML algorithms such as Random Decision Forests or Deep Neural Networks, require a significative amount of memory and computational power to train their models ([21, 19]). For this reason, the traditional path representations (such as adjacency matrix) could be used only when the networks are small (their size increase exponentially with the size of the networks).