Software project scheduling is the problem of allocating employees to tasks in a software project. Due to the large scale of current software projects, many studies have investigated the use of optimization algorithms to find good software project schedules. However, despite the importance of human factors to the success of software projects, existing work has considered only a limited number of human properties when formulating software project scheduling as an optimization problem. Moreover, the changing environments of software companies mean that software project scheduling is a dynamic optimization problem. However, there is a lack of effective dynamic scheduling approaches to solve this problem. This work proposes a more realistic mathematical model for the dynamic software project scheduling problem. This model considers that skill proficiency can improve over time and, different from previous work, it considers that such improvement is affected by the employees' properties of motivation and learning ability, and by the skill difficulty. It also defines the objective of employees' satisfaction with the allocation. It is considered together with the objectives of project duration, cost, robustness and stability under a variety of practical constraints. To adapt schedules to the dynamically changing software project environments, a multi-objective two-archive memetic algorithm based on Q-learning (MOTAMAQ) is proposed to solve the problem in a proactive-rescheduling way. Different from previous work, MOTAMAQ learns the most appropriate global and local search methods to be used for different software project environment states by using Q-learning. Extensive experiments on 18 dynamic benchmark instances and 3 instances derived from real-world software projects were performed. A comparison with seven other meta-heuristic algorithms shows that the strategies used by our novel approach are very effective in improving its convergence performance in dynamic environments, while maintaining a good distribution and spread of solutions. The Q-learning-based learning mechanism can choose appropriate search operators for the different scheduling environments. We also show how different trade-offs among the five objectives can provide software managers with a deeper insight into various compromises among many objectives, and enabling them to make informed decisions.