Online micro-video recommender systems aim to address the information explosion of micro-videos and make the personalized recommendation for users. However, the existing methods still have some limitations in learning representative user interests, since the multi-scale time effects, user interest group modeling, and false positive interactions are not taken into consideration. In view of this, we propose an end-to-end Multi-scale Time-aware user Interest modeling Network (MTIN). In particular, we first present an interest group routing algorithm to generate fine-grained user interest groups based on user's interaction sequence. Afterwards, to explore multi-scale time effects on user interests, we design a time-aware mask network and distill multiple temporal information by several parallel temporal masks. And then an interest mask network is introduced to aggregate fine-grained interest groups and generate the final user interest representation. At last, in the prediction unit, the user representation and micro-video candidates are fed into a deep neural network (DNN) for predictions. To demonstrate the effectiveness of our method, we conduct experiments on two publicly available datasets, and the experimental results demonstrate that our proposed model achieves substantial gains over the state-of-the-art methods.