Recommender systems have become indispensable for addressing information overload for micro-video services. They are used to characterize users’ preferences from their historical interactions and recommend micro-videos accordingly. Existing works largely leverage the multi-modal contents of micro-videos to enhance recommendation performance. However, limited efforts have been made to understand users’ complex behavior patterns, including their long- and short-term interests, as well as their temporal diversity preferences. In micro-video recommendation scenarios, users tend to have both stable long-term interests and dynamic short-term interests, and may feel tired after incessantly receiving numerous similar recommendations. In this paper, we propose a Temporal Diversity-aware micro-videorecommender (TD-VideoRec) for user behavior modeling, simultaneously capturing users’ long- and short-term preferences. Specifically, we first adopt a user-centric attention mechanism to cope with long-term interests. Then, we utilize an attention network on top of a long-short term memory network to obtain users’ short-term interests. Finally, a temporal diversity coefficient is introduced to characterize the temporal diversity preferences of users’ click behaviors. The value of the coefficient depends on the distinction between users’ long- and short-term interests extracted by vector orthogonal projection. Extensive experiments on two real-world datasets demonstrate that TD-VideoRec outperforms state-of-the-art methods.