Streaming applications are now the predominant tools for listening to music. What makes the success of such software is the availability of songs and especially their ability to provide users with relevant personalized recommendations. State of the art music recommender systems mainly rely on either Matrix factorization-based collaborative filtering approaches or deep learning architectures. Deep learning models usually use metadata for content-based filtering or predict the next user interaction (listening to a song) using a memory-based deep learning structure that learns from temporal sequences of user actions. Despite advances in deep learning models for song recommendation systems, none has taken advantage of the sequential nature of songs by learning sequence models that are based on content. Aside from the importance of prediction accuracy in recommendation systems, recent research has unveiled the importance of other significant aspects such as explainability and solving the cold start problem where a new user or item with no prior history of interactions joins an online platform. In this work, we propose a hybrid deep learning structure, called "SeER", that uses collaborative filtering and deep sequence models on the MIDI content of songs for recommendation. Our approach aims to take advantage of the superior capabilities of recurrent neural networks, the multidimensional time series aspect of songs, and the power of matrix factorization to: • provide more accurate personalized recommendations, • solve the item cold start problem which is in the case of where a new unrated song is added to the set of choices to recommend; and