Discovering anomalous bus trajectories can benefit transportation agencies to improve their services by helping them to deal with unexpected events such as detours or accidents. In this work, we propose a deep-learning strategy, which we name Spatial-Temporal Outlier Detector (STOD), that predicts the spatial/temporal anomaly degree of a bus trajectory by using learned representations of its GPS points. To calculate the score, STOD learns the regular behavior of bus trajectories by building a model that predicts the route id of buses. The degree of uncertainty on this prediction, measured by the entropy of the output class probability distribution, indicates the anomaly score of the trajectory. To perform the classification, STOD represents each point of a trajectory by the concatenation of two different representations. The first one (PAC embedding) is generated by the Point Activity Classifier (PAC) by leveraging temporal and spatial features on a stacked deep-learning model to predict the semantics of the point in terms of its bus activity (in route, bus stop, traffic signal, and other stops). The second representation (Geo embedding) captures the spatial relationship between a point and its geographical neighbors by applying a word embedding technique on the set of all trajectories. The experimental evaluation shows that our model is effective for filtering noisy trajectories since it outputs higher anomaly scores for both spatial and temporal anomalous trajectories than regular ones.